[Python-Dev] Looking for a developer who will work with me for at least 6 months to fix NumPy's dtype system.

2015-09-12 Thread Travis Oliphant
Hi all,

Apologies for cross-posting, but I need to get the word out and twitter
doesn't provide enough explanation.

I've been working on a second edition of my "Guide to NumPy" book.   It's
been a time-pressured activity, but it's helped me put more meat around my
ideas for how to fix NumPy's dtype system -- which I've been contemplating
off an on for 8 years.

I'm pretty sure I know exactly how to do it --- in a way that fits more
cleanly into Python.  It will take 3-6 months and will have residual
efforts needed that will last another 6 months --- making more types
available with NumPy, improving calculations etc.

This work will be done completely in public view and allow for public
comment.   It will not solve *all* of NumPy's problems, but it will put
NumPy's dtype system on the footing it in retrospect should have been put
on in the first place (if I had known then what I know now).

It won't be a grandiose rewrite.   It will be a pretty surgical fix to a
few key places in the code. However, it will break the ABI and require
recompilation of NumPy extensions (and so would need to be called NumPy
2.0).   This is unavoidable, but I don't see any problem with breaking the
ABI today given how easy it is to get distributions of Python these days
from a variety of sources (including using conda --- but not only using
conda).

For those that remember what happened in Python dev land, the changes will
be similar to when Guido changed Python 1.5.2 to Python 2.0.

I can mentor and work closely with someone who will work on this and we
will invite full participation and feedback from whomever in the community
also wants to participate --- but I can't do it myself full time (and it
needs someone full time+).   Fortunately, I can pay someone to do it if
they are willing to commit at least 6 months (it is not required to work at
Continuum for this, but you can have a job at Continuum if you want one).

I'm only looking for people who have enough experience with C or preferably
the Python C-API. You also have to *want* to work on this.   You need to be
willing to work with me on the project directly and work to have a
mind-meld with my ideas which will undoubtedly give rise to additional
perspectives and ideas for later work for you.

When I wrote NumPy 1.0, I put in 80+ hour weeks for about 6 months or more
and then 60+ weeks for another year.  I was pretty obsessed with it.   This
won't need quite that effort, but it will need something like it. Being
able to move to Austin is a plus but not required.   I can sponsor a visa
for the right candidate as well (though it's not guaranteed you will get
one with the immigration policies what they are).

This is a labor of love for so many of us and my desire to help the dtype
situation in NumPy comes from the same space that my desire to work on
NumPy in the first place came.  I will be interviewing people to work
on this as not everyone who may want to will really be qualified to do it
--- especially with so many people writing Cython these days instead of
good-ole C-API code :-)

Feel free to spread the news to anyone you can.   I won't say more until
I've found someone to work with me on this --- because I won't have the
time to follow-up with any questions or comments.Even if I can't find
someone I will publish the ideas --- but that also takes time and effort
that is in short supply for me right now.

If there is someone willing to fund this work, please let me know as well
-- that could free up more of my time.

Best,

-Travis


-- 

*Travis Oliphant*
*Co-founder and CEO*


@teoliphant
512-222-5440
http://www.continuum.io
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] The changes I am planning to NumPy I'd like to make only available on Python 3

2015-09-12 Thread Travis Oliphant
If it helps anyone in their interest level.   My intention would be to make
these changes to NumPy only available on Python 3 as a way to help continue
adoption of Python 3.

-Travis


-- 

*Travis Oliphant*
*Co-founder and CEO*


@teoliphant
512-222-5440
http://www.continuum.io
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] The process I intend to follow for any proposed changes to NumPy

2015-09-13 Thread Travis Oliphant
Hey all,

I just wanted to clarify, that I am very excited about a few ideas I have
--- but I don't have time myself to engage in the community process to get
these changes into NumPy. However, those are real processes --- I've
been coaching a few people in those processes for the past several years
already.

So, rather than do nothing, what I'm looking to do is to work with a few
people who I can share my ideas with, get excited about the ideas, and then
who will work with the community to get them implemented.   That's what I
was announcing and talking about yesterday --- looking for interested
people who want to work on NumPy *with* the NumPy community.

In my enthusiasm, I realize that some may have mis-understood my
intention.  There is no 'imminent' fork, nor am I planning on doing some
crazy amount of work that I then try to force on other developers of NumPy.


What I'm planning to do is find people to train on NumPy code base (people
to increase the diversity of the developers would be ideal -- but hard to
accomplish).  I plan to train them on NumPy based on my experience, and on
what I think should be done --- and then have *them* work through the
community process and engage with others to get consensus (hopefully not
losing too much in translation in the process --- but instead getting even
better).

During that process I will engage as a member of the community and help
write NEPs and other documents and help clarify where it makes sense as I
can.   I will be filtering for people that actually want to see NumPy get
better.Until I identify the people and work with them, it will be hard
to tell how this will best work.   So, stay tuned.

If all goes well, what you should see in a few weeks time are specific
proposals, a branch or two, and the beginnings of some pull requests.If
you don't see that, then I will not have found the right people to help me,
and we will all continue to go back to searching.

While I'm expecting the best, in the worst case, we get additional people
who know the NumPy code base and can help squash bugs as well as implement
changes that are desired.Three things are needed if you want to
participate in this:  1) A willingness to work with the open source
community, 2) a deep knowledge of C and in-particular CPython's brand of C,
and 3) a willingness to engage with me, do a mind-meld and dump around the
NumPy code base, and then improve on what is in my head with the rest of
the community.

Thanks,

-Travis
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] A new webpage promoting Compiler technology for CPython

2013-02-15 Thread Travis Oliphant
Hey all, 

With Numba and Blaze we have been doing a lot of work on what essentially is 
compiler technology and realizing more and more that we are treading on ground 
that has been plowed before with many other projects.   So, we wanted to create 
a web-site and perhaps even a mailing list or forum where people could 
coordinate and communicate about compiler projects, compiler tools, and ways to 
share efforts and ideas.

The website is:  http://compilers.pydata.org/

This page is specifically for Compiler projects that either integrate with or 
work directly with the CPython run-time which is why PyPy is not presently 
listed.  The PyPy project is a great project but we just felt that we wanted to 
explicitly create a collection of links to compilation projects that are 
accessible from CPython which are likely less well known.

But that is just where we started from.   The website is intended to be a 
community website constructed from a github repository.   So, we welcome pull 
requests from anyone who would like to see the website updated to reflect their 
related project.Jon Riehl (Mython, PyFront, ROFL, and many other 
interesting projects) and Stephen Diehl (Blaze) and I will be moderating the 
pull requests to begin with.   But, we welcome others with similar interests to 
participate in that effort of moderation.

The github repository is here:  https://github.com/pydata/compilers-webpage

This is intended to be a community website for information spreading, and so we 
welcome any and all contributions.  

Thank you,

Travis Oliphant


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Travis Oliphant
Nick Coghlan wrote:
> David Hopwood wrote:
>> Armin Rigo wrote:
>>> Hi,
>>>
>>> There is an oversight in the design of __index__() that only just
>>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>>> machine with >= 2GB of RAM:
>>>
>>> >>> s = 'x' * (2**100)   # works!
>>> >>> len(s)
>>> 2147483647
>>>
>>> This is because PySequence_Repeat(v, w) works by applying 
>>> w.__index__ in
>>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>>> result to fit in a Py_ssize_t.
>>
>> Clipping the result sounds like it would *never* be a good idea. What 
>> was
>> the rationale for that? It should throw an exception.
>
> A simple demonstration of the clipping behaviour that works on 
> machines with limited memory:
>
> >>> (2**100).__index__()
> 2147483647
> >>> (-2**100).__index__()
> -2147483648
>
> PEP 357 doesn't even mention the issue, and the comment on long_index 
> in the code doesn't give a rationale - it just notes that the function 
> clips the result.
I can't think of the rationale so it was probably an unclear one and 
should be thought of as a bug.  The fact that it isn't discussed in the 
PEP means it wasn't thought about clearly.  I think I had the vague idea 
that .__index_() should always succeed.  But, this shows a problem with 
that notion.
>
> I'm inclined to call it a bug, too, but I've cc'ed Travis to see if he 
> can shed some light on the question - the implementation of long_index 
> explicitly suppresses the overflow error generated by 
> _long_as_ssize_t, so the current behaviour appears to be deliberate.

If it was deliberate, it was a hurried decision and one that should be 
re-thought and probably changed.  I think the idea came from the fact 
that out-of-bounds slicing returns empty lists and since __index__ was 
primarily developed to allow integer-like objects to be used in slicing 
it adopted that behavior.  In fact it looks like the comment above 
_long_index contains words from the comment above _PyEval_SliceIndex 
showing the direct borrowing of the idea.   But, _long_index is clearly 
the wrong place to handle the situation since it is used by more than 
just the slicing code.  An error return is already handled by the 
_Eval_SliceIndex code anyway.

I say it's a bug that should be fixed.  Don't clear the error, raise it.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-08-02 Thread Travis Oliphant
Nick Coghlan wrote:
> 
> 
> One possibility would be to invert the sense of that flag and call it 
> "typeerror", which probably more accurately reflects what it's intended for - 
> it's a way of telling the function "if this object does not have the correct 
> type, tell me by setting this flag instead of by setting the Python error 
> state".

+1 on changing the name (type_error might be a better spelling)

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-08-03 Thread Travis Oliphant
Nick Coghlan wrote:
> Nick Coghlan wrote:
>> Armin Rigo wrote:
>>> Hi,
>>>
>>> There is an oversight in the design of __index__() that only just
>>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>>> machine with >= 2GB of RAM:
>>>
>>> >>> s = 'x' * (2**100)   # works!
>>> >>> len(s)
>>> 2147483647
>>>
>>> This is because PySequence_Repeat(v, w) works by applying 
>>> w.__index__ in
>>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>>> result to fit in a Py_ssize_t.  This means that the above problem 
>>> exists
>>> with all sequences, not just strings, given enough RAM to create such
>>> sequences with 2147483647 items.
>>>
>>> For reference, in 2.4 we correctly get an OverflowError.
>>>
>>> Argh!  What should be done about it?
>>
>> I've now got a patch on SF that aims to fix this properly [1].
>
> I revised this patch to further reduce the code duplication associated 
> with the indexing code in the standard library.
>
> The patch now has three new functions in the abstract C API:
>
>   PyNumber_Index (used in a dozen or so places)
> - raises IndexError on overflow
>   PyNumber_AsSsize_t (used in 3 places)
> - raises OverflowError on overflow
>   PyNumber_AsClippedSsize_t() (used once, by _PyEval_SliceIndex)
> - clips to PY_SSIZE_T_MIN/MAX on overflow
>
> All 3 have an int * output argument allowing type errors to be flagged 
> directly to the caller rather than through PyErr_Occurred().
>
> Of the 3, only PyNumber_Index is exposed through the operator module.
>
> Probably the most interesting thing now would be for Travis to review 
> it, and see whether it makes things easier to handle for the Numeric 
> scalar types (given the amount of code the patch deleted from the 
> builtin and standard library data types, hopefully the benefits to 
> Numeric will be comparable).


I noticed most of the checks for PyInt where removed in the patch.  If I 
remember correctly, I left these in for "optimization."   Other than 
that, I think the patch is great.

As far as helping with NumPy,  I think it will help to be able to remove 
special-checks for all the different integer-types.  But, this has not 
yet been done in the NumPy code.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] __index__ clipping

2006-08-10 Thread Travis Oliphant
Guido van Rossum wrote:
>
> What do you think (10**10).__index__() should return (when called from 
> Python)?
>
I'm with Guido on this point.  I think (10**10).__index__() should 
return the full long integer when called from within Python.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Martin v. Löwis wrote:
> Josiah Carlson schrieb:
> 
>>One could also toss wxPython, VTK, or any one of the other GUI libraries
>>into the mix for visualizing those images, of which wxPython just
>>acquired no-copy display of PIL images, and being able to manipulate
>>them with numpy (of which some wxPython built in classes use numpy to
>>speed up manipulation) would be very useful.
> 
> 
> I'm doubtful that this PEP alone would allow zero-copy sharing of images
> for display. Often, the libraries need the data in a different format.
> So they need to copy, even if they could understand the other format.
> However, the PEP won't allow "understanding" the format. If I know I
> have an array of 4-byte values: which of them is R, G, B, and A?
> 

You give a name to the fields: 'R', 'G', 'B', and 'A'.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Jim Jewett wrote:
> Travis E. Oliphant wrote:
> 
> 
>>Two packages need to share a chunk of memory (the package authors do not
>>know each other and only have and Python as a common reference).  They
>>both want to describe that the memory they are sharing has some
>>underlying binary structure.
> 
> 
> As a quick sanity check, please tell me where I went off track.
> 
> it sounds to me like you are assuming that:
> 
> (1)  The memory chunk represents a single object (probably an array of
> some sort)
> (2)  That subchunks can themselves be described by a (single?)
> repeating C struct.
> (3)  You can't just use the C header, since you want this at run-time.
> (4)  It would be enough if you could say
> 
> This is an array of 500 elements that look like
> 
> struct {
>   int  simple;
>   struct nested {
>char name[30];
>char addr[45];
>int  amount;
>   }
> 

Sure.  I think that's pretty much it.  I assume you mean object in the 
general sense and not as in (Python object).


> (5)  But is it not acceptable to use Martin's suggested ctypes
> equivalent of (building out from the inside):


Part of the problem is that ctypes uses a lot of different Python types 
(that's what I mean by "multi-object" to accomplish it's goal).  What 
I'm looking for is a single Python type that can be passed around and 
explains binary data.

Remember the buffer protocol is in compiled code.  So, as a result,

1) It's harder to construct a class to pass through the protocol using 
the multiple-types approach of ctypes.

2) It's harder to interpret the object recevied through the buffer 
protocol.

Sure, it would be *possible* to use ctypes, but I think it would be very 
difficult.  Think about how you would write the get_data_format C 
function in the extended buffer protocol for NumPy if you had to import 
ctypes and then build a class just to describe your data.  How would you 
interpret what you get back?

The ctypes "format-description" approach is not as unified as a single 
Python type object that I'm proposing.

In NumPy, we have a very nice, compact description of complicated data 
already available.  Why not use what we've learned?

I don't think we should just *use ctypes because it's there* when the 
way it describes binary data was not constructed with the extended 
buffer protocol in mind.

The other option, of course, which would not introduce a new Python type 
is to use the array interface specification and pass a list of tuples. 
But, I think this is also un-necessarily wasteful because the sending 
object has to construct it and the receiving object has to de-construct 
it.  The whole point of the (extended) buffer protocol is to communicate 
this information more quickly.



-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Greg Ewing wrote:
> Travis E. Oliphant wrote:
> 
> 
>>Greg Ewing wrote:
> 
> 
>>>What exactly does "bit" mean in that context?   
>>
>>Do you mean "big" ?
> 
> 
> No, you've got a data type there called "bit",
> which seems to imply a size, in contradiction
> to the size-independent nature of the other
> types. I'm asking what size-independent
> information it's meant to convey.

Ah.  I see what you were saying now.   I guess the 'bit' type is 
different (we actually don't have that type in NumPy so my understanding 
of it is limited).

The 'bit' type re-intprets the size information to be in units of "bits" 
and so implies a "bit-field" instead of another data-format.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Martin v. Löwis wrote:
> Robert Kern schrieb:
> 
>>>As I unification mechanism, I think it is insufficient. I doubt it
>>>can express all the concepts that ctypes supports.
>>
>>What do you think is missing that can't be added?
> 
> 
> I can factually only report what is missing. Whether it can be added,
> I don't know. As I just wrote in a few other messages: pointers,
> unions, functions pointers, packed structs, incomplete/recursive
> types. Also "flexible array members" (i.e. open-ended arrays).
> 

I understand function pointers, pointers, and unions.

Function pointers are "supported" with the void data-type and could be 
more specifically supported if it were important.   People typically 
don't use the buffer protocol to send function-pointers around in a way 
that the void description wouldn't be enough.


Pointers are also "supported" with the void data-type.  If pointers to 
other data-types were an important feature to support, then this could 
be added in many ways (a flag on the data-type object for example is how 
this is done is NumPy).

Unions are actually supported (just define two fields with the same 
offset).

I don't know what you mean by "packed structs" (unless you are talking 
about alignment issues in which case there is support for it).

I'm not sure I understand what you mean by "incomplete / recursive" 
types unless you are referring to something like a node where an element 
of the structure is a pointer to another structure of the same kind 
(like used in linked-lists or trees).  If that is the case, then it's 
easily supported once support for pointers is added.

I also don't know what you mean by "open-ended arrays."  The data-format 
is meant to describe a fixed-size chunk of data.

String syntax is not needed to support all of these things.  What I'm 
asking for and proposing is a way to construct an instance of a single 
Python type that communicates this data-format information in a 
standardized way.


-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Armin Rigo wrote:
> Hi Travis,
> 
> On Fri, Oct 27, 2006 at 02:05:31PM -0600, Travis E. Oliphant wrote:
> 
>>This PEP proposes adapting the data-type objects from NumPy for
>>inclusion in standard Python, to provide a consistent and standard
>>way to discuss the format of binary data. 
> 
> 
> How does this compare with ctypes?  Do we really need yet another,
> incompatible way to describe C-like data structures in the standarde
> library?

There is a lot of subtlety in the details that IMHO clouds the central 
issue which I will try to clarify here the way I see it.


First of all:

In order to make sense of the data-format object that I'm proposing you 
have to see the need to share information about data-format through an 
extended buffer protocol (which I will be proposing soon).  I'm not 
going to try to argue that right now because there are a lot of people 
who can do that.

So, I'm going to assume that you see the need for it.  If you don't, 
then just suspend concern about that for the moment.  There are a lot of 
us who really see the need for it.

Now:

To describe data-formats ctypes uses a Python type-object defined for 
every data-format you might need.

In my view this is an 'over-use' of the type-object and in fact, to be 
useful, requires the definition of a meta-type that carries the relevant 
additions to the type-object that are needed to describe data (like 
function pointers to get data in and out of Python objects).

My view is that it is un-necessary to use a different type object to 
describe each different data-type.

The route I'm proposing is to define (in C) a *single* new Python type 
(called a data-format type) that carries the information needed to 
describe a chunk of memory.

In this way *instances* of this new type define data-formats.

In ctypes *instances* of the "meta-type" (i.e. new types) define 
data-formats (actually I'm not sure if all the new c-types are derived 
from the same meta-type).

So, the big difference is that I think data-formats should be 
*instances* of a single type.  There is no need to define a Python 
type-object for every single data-type.  In fact, not only is there no 
need, it makes the extended buffer protocol I'm proposing even more 
difficult to use and explain.

Again, my real purpose is the extended buffer protocol.  These 
data-format type is a means to that end.  If the consensus is that 
nobody sees a greater use of the data-format type beyond the buffer 
protocol, then I will just write 1 PEP for the extended buffer protocol.


-Travis



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Greg Ewing wrote:
> Travis Oliphant wrote:
> 
> 
>>Part of the problem is that ctypes uses a lot of different Python types 
>>(that's what I mean by "multi-object" to accomplish it's goal).  What 
>>I'm looking for is a single Python type that can be passed around and 
>>explains binary data.
> 
> 
> It's not clear that multi-object is a bad thing in and
> of itself. It makes sense conceptually -- if you have
> a datatype object representing a struct, and you ask
> for a description of one of its fields, which could
> be another struct or array, you would expect to get
> another datatype object describing that.
> 
> Can you elaborate on what would be wrong with this?
> 
> Also, can you clarify whether your objection is to
> multi-object or multi-type. They're not the same thing --
> you could have a data structure built out of multiple
> objects that are all of the same Python type, with
> attributes distinguishing between struct, array, etc.
> That would be single-type but multi-object.

I've tried to clarify this in another post.  Basically, what I don't 
like about the ctypes approach is that it is multi-type (every new 
data-format is a Python type).

In order to talk about all these Python types together, then they must 
all share some attribute (or else be derived from a meta-type in C with 
a specific function-pointer entry).

I think it is simpler to think of a single Python type whose instances 
convey information about data-format.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Martin v. Löwis wrote:
> Travis Oliphant schrieb:
> 
>>So, the big difference is that I think data-formats should be 
>>*instances* of a single type.
> 
> 
> This is nearly the case for ctypes as well. All layout descriptions
> are instances of the type type. Nearly, because they are instances
> of subtypes of the type type:
> 
> py> type(ctypes.c_long)
> 
> py> type(ctypes.c_double)
> 
> py> type(ctypes.c_double).__bases__
> (,)
> py> type(ctypes.Structure)
> 
> py> type(ctypes.Array)
> 
> py> type(ctypes.Structure).__bases__
> (,)
> py> type(ctypes.Array).__bases__
> (,)
> 
> So if your requirement is "all layout descriptions ought to have
> the same type", then this is (nearly) the case: they are instances
> of type (rather then datatype, as in your PEP).
> 

The big difference, however, is that by going this route you are forced 
to use the "type object" as your data-format "instance".  This is 
fitting a square peg into a round hole in my opinion.To really be 
useful, you would need to add the attributes and (most importantly) 
C-function pointers and C-structure members to these type objects.  I 
don't even think that is possible in Python (even if you do create a 
meta-type that all the c-type type objects can use that carries the same 
information).

There are a few people claiming I should use the ctypes type-hierarchy 
but nobody has explained how that would be possible given the 
attributes, C-structure members and C-function pointers that I'm proposing.

In NumPy we also have a Python type for each basic data-format (we call 
them array scalars).  For a little while they carried the data-format 
information on the Python side.  This turned out to be not flexible 
enough.  So, we expanded the PyArray_Descr * structure which has always 
been a part of Numeric (and the array module array type) into an actual 
Python type and a lot of things became possible.

It was clear to me that we were "on to something".  Now, the biggest 
claim against the gist of what I'm proposing (details we can argue 
about), seems from my perspective to be a desire to "go backwards" and 
carry data-type information around with a Python type.

The data-type object did not just appear out of thin-air one day.  It 
really can be seen as an evolution from the beginnings of Numeric (and 
the Python array module).

So, this is what we came up with in the NumPy world.  Ctypes came up 
with something a bit different.  It is not "trivial" to "just use 
ctypes."  I could say the same thing and tell ctypes to just use NumPy's 
  data-type object.   It could be done that way, but of course it would 
take a bit of work on the part of ctypes to make that happen.

Having ctypes in the standard library does not mean that any other 
discussion of how data-format should be represented has been decided on. 
If I had known that was what it meant to put ctypes in the standard 
library, I would have been more vocal several months ago.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Nick Coghlan wrote:
> Travis E. Oliphant wrote:
> 
>>However, the existence of an alternative strategy using a single Python 
>>type and multiple instances of that type to describe binary data (which 
>>is the NumPy approach and essentially the array module approach) means 
>>that we can't just a-priori assume that the way ctypes did it is the 
>>only or best way.
> 
> 
> As a hypothetical, what if there was a helper function that translated a 
> description of a data structure using basic strings and sequences (along the 
> lines of what you have in your PEP) into a ctypes data structure?
> 

That would be fine and useful in fact.  I don't see how it helps the 
problem of "what to pass through the buffer protocol"  I see passing 
c-types type objects around on the c-level as an un-necessary and 
burdensome approach unless the ctypes objects were significantly enhanced.


> 
> In fact, it may make sense to just use the lists/strings directly as the data 
> exchange format definitions, and let the various libraries do their own 
> translation into their private format descriptions instead of creating a new 
> one-type-to-describe-them-all.

Yes, I'm open to this possibility.   I basically want two things in the 
object passed through the extended buffer protocol:

1) It's fast on the C-level
2) It covers all the use-cases.

If just a particular string or list structure were passed, then I would 
drop the data-format PEP and just have the dataformat argument of the 
extended buffer protocol be that thing.

Then, something that converts ctypes objects to that special format 
would be very nice indeed.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Martin v. Löwis wrote:
> Travis Oliphant schrieb:
> 
>>The big difference, however, is that by going this route you are forced 
>>to use the "type object" as your data-format "instance".
> 
> 
> Since everything is an object (an "instance) in Python, this is not
> such a big difference.
> 

I think it actually is.  Perhaps I'm wrong, but a type-object is still a 
special kind of an instance of a meta-type.  I once tried to add 
function pointers to a type object by inheriting from it.  But, I was 
told that Python is not set up to handle that.  Maybe I misunderstood.

Let me be very clear.  The whole reason I make any statements about 
ctypes is because somebody else brought it up.  I'm not trying to 
replace ctypes and the way it uses type objects to represent data 
internally.   All I'm trying to do is come up with a way to describe 
data-types through a buffer protocol.  The way ctypes does it is "too" 
bulky by definining a new Python type for every data-format.

While semantically you may talk about the equivalency of types being 
instances of a "meta-type" and regular objects being instances of a 
type.  My understanding is still that there are practical differences 
when it comes to implementation --- and certain things that "can't be done"

Here's what I mean by the difference.

This is akin to what I'm proposing

struct {
PyObject_HEAD
/* whatever you need to represent your instance
Quite a bit of flexibility
*/
} PyDataFormatObject;

A Python type object (what every C-types data-format "type" inherits 
from) has a C-structure

struct {
PyObject_VAR_HEAD
char *tp_name;
 int tp_basicsize, tp_itemsize;

 /* Methods to implement standard operations */

 destructor tp_dealloc;
 printfunc tp_print;
 getattrfunc tp_getattr;
 setattrfunc tp_setattr;
 cmpfunc tp_compare;
 reprfunc tp_repr;

...
...

PyObject *tp_bases;
 PyObject *tp_mro; /* method resolution order */
 PyObject *tp_cache;
 PyObject *tp_subclasses;
 PyObject *tp_weaklist;
 destructor tp_del;

 ... /* + more under certain conditions */
} PyTypeObject;


Why in the world do we need to carry all this extra baggage around in 
each data-format instance in order to just describe data?  I can see why 
it's useful for ctypes to do it and that's fine.  But, the argument that 
every exchange of data-format information should use this type-object 
instance is hard to swallow.

So, I'm happy to let ctypes continue on doing what it's doing trusting 
its developers to have done something good.  I'd be happy to drop any 
reference to ctypes.  The only reason to have the data-type objects is 
something to pass as part of the extended buffer protocol.

> 
> 
> Can you explain why that is? In the PEP, I see two C fucntions:
> setitem and getitem. I think they can be implemented readily with
> ctypes' GETFUNC and SETFUNC function pointers that it uses
> all over the place.

Sure, but where do these function pointers live and where are they 
stored.  In ctypes it's in the CField_object.  Now, this is closer to 
what I'm talking about.  But, why is not not the same thing.  Why, yet 
another type object to talk about fields of a structure?

These are rhetorical questions.  I really don't expect or need an answer 
because I'm not questioning why ctypes did what it did for solving the 
problem it was solving.  I am questioning anyone who claims that we 
should use this mechanism for describing data-formats in the extended 
buffer protocol.

> 
> I don't see a requirement to support C structure members or
> function pointers in the datatype object.
> 
> 
>>There are a few people claiming I should use the ctypes type-hierarchy 
>>but nobody has explained how that would be possible given the 
>>attributes, C-structure members and C-function pointers that I'm proposing.
> 
> 
> Ok, here you go. Remember, I'm still not claiming that this should be
> done: I'm just explaining how it could be done.

O.K.  Thanks for putting in the effort.   It doesn't answer my real 
concerns, though.

>>It was clear to me that we were "on to something".  Now, the biggest 
>>claim against the gist of what I'm proposing (details we can argue 
>>about), seems from my perspective to be a desire to "go backwards" and 
>>carry data-type information around with a Python type.
> 
> 
> I, at least, have no such desire. I just explained that the ctypes
> model of memory layouts is just as expressive as the one in the
> PEP. 

I agree with this.  I'm very aware of what "can&quo

Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
> 
>>But, there are distinct disadvantages to this approach compared to what 
>>I'm trying to allow.   Martin claims that the ctypes approach is 
>>*basically* equivalent but this is just not true.
> 
> 
> I may claim that, but primarily, my goal was to demonstrate that the
> proposed PEP cannot be used to describe ctypes object layouts (without
> checking, I can readily believe that the PEP covers everything in
> the array and struct modules).
> 

That's a fine argument.  You are right in terms of the PEP as it stands. 
  However, I want to make clear that a single Python type object *could* 
be used to describe data including all the cases you laid out.  It would 
not be difficult to extend the PEP to cover all the cases you've 
described --- I'm not sure that's desireable.  I'm not trying to replace 
what ctypes does.  I'm just trying to get something that we can use to 
exchange data-format information through the extended buffer protocol.

It really comes down to using Python type-objects as the instances 
describing data-formats (which ctypes does) or "normal" Python objects 
as the instances describing data-formats (what the PEP proposes).

> 
>>It could be made more 
>>true if the ctypes objects inherited from a "meta-type" and if Python 
>>allowed meta-types to expand their C-structures.  But, last I checked 
>>this is not possible.
> 
> 
> That I don't understand. a) what do you think is not possible?

Extending the C-structure of PyTypeObject and having Python types use 
that as their "type-object".

  b)
> why is that an important difference between a datatype and a ctype?

Because with instances of C-types you are stuck with the PyTypeObject 
structure.  If you want to add anything you have to do it in the 
dictionary.

Instances of a datatype allow adding anything after the PyObject_HEAD 
structure.

> 
> If you are suggesting that, given two Python types A and B, and
> B inheriting from A, that the memory layout of B cannot extend
> the memory layout of A, then: that is certainly possible in Python,
> and there are many examples for it.
>

I know this.  I've done it for many different objects.  I'm saying it's 
not quite the same when what you are extending is the PyTypeObject and 
trying to use it as the type object for some other object.


> 
>>A Python type object is a very particular kind of Python-type.  As far 
>>as I can tell, it's not as flexible in terms of the kinds of things you 
>>can do with the "instances" of a type object (i.e. what ctypes types 
>>are) on the C-level.
> 
> 
> Ah, you are worried that NumArray objects would have to be *instances*
> of ctypes types. That wouldn't be necessary at all. Instead, if each
> NumArray object had a method get_ctype(), which returned a ctypes type,
> then you would get the same desciptiveness that you get with the
> PEP's datatype.
> 

No, I'm not worried about that (It's not NumArray by the way, it's 
NumPy.  NumPy replaces both NumArray and Numeric).

NumPy actually interfaces with ctypes quite well.  This is how I learned 
anything I might know about ctypes.  So, I'm well aware of this.

What I am concerned about is using Python type objects (i.e. Python 
objects that can be cast in C to PyTypeObject *) outside of ctypes to 
describe data-formats when you don't need it and it just complicates 
dealing with the data-format description.

> 
>>Where is the discussion that crowned the ctypes way of doing things as 
>>"the one true way"
> 
> 
> It hasn't been crowned this way. Me, personally, I just said two things
> about this PEP and ctypes:

Thanks for clarifying, but I know you didn't say this.  Others, however, 
basically did.

> a) the PEP does not support all concepts that ctypes needs

It could be extended, but I'm not sure it *needs* to be in it's real 
context.  I'm very sorry for contributing to the distraction that ctypes 
should adopt the PEP.  My words were unclear.  But, I'm not pushing for 
that.  I really have no opinion how ctypes describes data.

> b) ctypes can express all examples in the PEP
> in response to your proposal that ctypes should adopt the PEP, and
> that ctypes is not good enough to be the one true way.
> 

I think it is "good enough" in the semantic sense.  But, I think using 
type objects in this fashion for general-purpose data-description is 
over-kill and will be much harder to extend and deal with.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Thomas Heller wrote:
> 
> (I tried to read the whole thread again, but it is too large already.)
> 
> There is a (badly named, probably) api to access information
> about ctypes types and instances of this type.  The functions are
> PyObject_stgdict(obj) and PyType_stgdict(type).  Both return a
> 'StgDictObject' instance or NULL if the funtion fails.  This object
> is the ctypes' type object's __dict__.
> 
> StgDictObject is a subclass of PyDictObject and has fields that
> carry information about the C type (alignment requirements, size in bytes,
> plus some other stuff).  Also it contains several pointers to functions
> that implement (in C) struct-like functionality (packing/unpacking).
> 
> Of course several of these fields can only be used for ctypes-specific
> purposes, for example a pointer to the ffi_type which is used when
> calling foreign functions, or the restype, argtypes, and errcheck fields
> which are only used when the type describes a function pointer.
> 
> 
> This mechanism is probably a hack because it'n not possible to add C 
> accessible
> fields to type objects, on the other hand it is extensible (in principle, at 
> least).
> 

Thank you for the description.  While I've studied the ctypes code, I 
still don't understand the purposes beind all the data-structures.

Also, I really don't have an opinion about ctypes' implementation.   All 
my comparisons are simply being resistant to the "unexplained" idea that 
I'm supposed to use ctypes objects in a way they weren't really designed 
to be used.

For example, I'm pretty sure you were the one who made me aware that you 
can't just extend the PyTypeObject.  Instead you extended the tp_dict of 
the Python typeObject to store some of the extra information that is 
needed to describe a data-type like I'm proposing.

So, if you I'm just describing data-format information, why do I need 
all this complexity (that makes ctypes implementation easier/more 
natural/etc)?  What if the StgDictObject is the Python data-format 
object I'm talking about?  It actually looks closer.

But, if all I want is the StgDictObject (or something like it), then why 
should I pass around the whole type object?

This is all I'm saying to those that want me to use ctypes to describe 
data-formats in the extended buffer protocol.  I'm not trying to change 
anything in ctypes.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Martin v. Löwis wrote:
> Stephan Tolksdorf schrieb:
> 
>>While Travis' proposal encompasses the data format functionality within 
>>the struct module and overlaps with what ctypes has to offer, it does 
>>not aim to replace ctypes.
> 
> 
> This discussion could have been a lot shorter if he had said so.
> Unfortunately (?) he stated that it was *precisely* a motivation
> of the PEP to provide a standard data description machinery that
> can then be adopted by the struct, array, and ctypes modules.

Struct and array I was sure about.  Ctypes less sure.  I'm very sorry 
for the distraction I caused by mis-stating my objective.   My objective 
is really the extended buffer protocol.  The data-type object is a means 
to that end.

I do think ctypes could make use of the data-type object and that there 
is a real difference between using Python type objects as data-format 
descriptions and using another Python type for those descriptions.  I 
thought to go the ctypes route (before I even knew what ctypes did) but 
decided against it for a number of reasons.

But, nonetheless those are side issues.  The purpose of the PEP is to 
provide an object that the extended buffer protocol can use to share 
data-format information.  It should be considered primarily in that context.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Paul Moore wrote:
> On 10/31/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:
> 
>>Martin v. Löwis wrote:
>>
>>>[...] because I still don't quite understand what the PEP
>>>wants to achieve.
>>>
>>
>>Are you saying you still don't understand after having read the extended
>>buffer protocol PEP, yet?
> 
> 
> I can't speak for Martin, but I don't understand how I, as a Python
> programmer, might use the data type objects specified in the PEP. I
> have skimmed the extended buffer protocol PEP, but I'm conscious that
> no objects I currently use support the extended buffer protocol (and
> the PEP doesn't mention adding support to existing objects), so I
> don't see that as too relevant to me.

Do you use the PIL?  The PIL supports the array interface.

CVXOPT supports the array interface.

Numarray
Numeric
NumPy

all support the array interface.

> 
> I have also installed numpy, and looked at the help for numpy.dtype,
> but that doesn't add much to the PEP. 

The source-code is available.

> The freely available chapters of
> the numpy book explain how dtypes describe data structures, but not
> how to use them. 


The freely available Numeric documentation doesn't
> refer to dtypes, as far as I can tell. 

It kind of does, they are PyArray_Descr * structures in Numeric.  They 
just aren't Python objects.


Is there any documentation on
> how to use dtypes, independently of other features of numpy? 

There are examples and other help pages at http://www.scipy.org

If not,
> can you clarify where the benefit lies for a Python user of this
> proposal? (I understand the benefits of a common language for
> extensions to communicate datatype information, but why expose it to
> Python? How do Python users use it?)
> 

The only benefit I imagine would be for an extension module library 
writer and for users of the struct and array modules.  But, other than 
that, I don't know.  It actually doesn't have to be exposed to Python. 
I used Python notation in the PEP to explain what is basically a 
C-structure.  I don't care if the object ever gets exposed to Python.

Maybe that's part of the communication problem.


> This is probably all self-evident to the numpy community, but I think
> that as the PEP is aimed at a wider audience it needs a little more
> background.

It's hard to write that background because most of what I understand is 
from the NumPy community.  I can't give you all the examples but my 
concern is that you have all these third party libraries out there 
describing what is essentially binary data and using either 
string-copies or the buffer protocol + extra information obtained by 
some method or attribute that varies across the implementations.  There 
should really be a standard for describing this data.

There are attempts at it in the struct and array module.  There is the 
approach of ctypes but I claim that using Python type objects is 
over-kill for the purposes of describing data-formats.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Extending the buffer protocol to share array information.

2006-10-31 Thread Travis Oliphant
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
> 
>>Several extensions to Python utilize the buffer protocol to share
>>the location of a data-buffer that is really an N-dimensional
>>array.  However, there is no standard way to exchange the
>>additional N-dimensional array information so that the data-buffer
>>is interpreted correctly.  The NumPy project introduced an array
>>interface (http://numpy.scipy.org/array_interface.shtml) through a
>>set of attributes on the object itself.  While this approach
>>works, it requires attribute lookups which can be expensive when
>>sharing many small arrays.  
> 
> 
> Can you please give examples for real-world applications of this
> interface, preferably examples involving multiple
> independently-developed libraries?
> ("this" being the current interface in NumPy - I understand that
>  the PEP's interface isn't implemented, yet)
> 

Examples of Need

 1) Suppose you have a image in *.jpg format that came from a
 camera and you want to apply Fourier-based image recovery to try
 and de-blur the image using modified Wiener filtering.  Then you
 want to save the result in *.png format.  The PIL provides an easy
 way to read *.jpg files into Python and write the result to *.png 

 and NumPy provides the FFT and the array math needed to implement
 the algorithm.  Rather than have to dig into the details of how
 NumPy and the PIL interpret chunks of memory in order to write a
 "converter" between NumPy arrays and PIL arrays, there should be
 support in the buffer protocol so that one could write
 something like:

 # Read the image
 a = numpy.frombuffer(Image.open('myimage.jpg')).

 # Process the image.
 A = numpy.fft.fft2(a)
 B = A*inv_filter
 b = numpy.fft.ifft2(B).real

 # Write it out
 Image.frombuffer(b).save('filtered.png')

 Currently, without this proposal you have to worry about the "mode"
 the image is in and get it's shape using a specific method call
 (this method call is different for every object you might want to
 interface with).

 2) The same argument for a library that reads and writes
 audio or video formats exists.

 3) You want to blit images onto a GUI Image buffer for rapid
 updates but need to do math processing on the image values
 themselves or you want to read the images from files supported by
 the PIL.

 If the PIL supported the extended buffer protocol, then you would
 not need to worry about the "mode" and the "shape" of the Image.

 What's more, you would also be able to accept images from any
 object (like NumPy arrays or ctypes arrays) that supported the
 extended buffer protcol without having to learn how it shares
 information like shape and data-format.


I could have also included examples from PyGame, OpenGL, etc.  I thought 
people were more aware of this argument as we've made it several times 
over the years.  It's just taken this long to get to a point to start 
asking for something to get into Python.


> Paul Moore (IIRC) gave the example of equalising the green values
> and maximizing the red values in a PIL image by passing it to NumPy:
> Is that a realistic (even though not-yet real-world) example? 

I think so, but I've never done something like that.

If
> so, what algorithms of NumPy would I use to perform this image
> manipulation (and why would I use NumPy for it if I could just
> write a for loop that does that in pure Python, given PIL's
> getpixel/setdata)?

Basically you would use array math operations and reductions (ufuncs and 
it's methods which are included in NumPy).  You would do it this way for 
speed.   It's going to be a lot slower doing those loops in Python. 
NumPy provides the ability to do them at close-to-C speeds.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Jim Jewett wrote:
> I'm still not sure exactly what is missing from ctypes.  To make this 
> concrete:

I think the only thing missing from ctypes "expressiveness" as far as I 
can tell in terms of what you "can" do is the byte-order representation.

What is missing is ease-of use for producers and consumers in 
interpreting the data-type.   When I speak of Producers and consumers, 
I'm largely talking about C-code (or Java or .NET) code writers.

Producers must basically use Python code to create classes of various 
types.   This is going to be slow in 'C'.  Probably slower than the 
array interface (which is what we have no informally).

Consumers are going to have a hard time interpreting the result.  I'm 
not even sure how to do that, in fact.  I'd like NumPy to be able to 
understand ctypes as a means to specify data.  Would I have to check 
against all the sub-types of CDataType, pull out the fields, check the 
tp_name of the type object?  I'm not sure.

It seems like a string with the C-structure would be better as a 
data-representation, but then a third-party library would want to parse 
that so that Python might as well have it's own parser for data-types. 

So, Python might as well have it's own way to describe data.  My claim 
is this default way should *not* be overloaded by using Python 
type-objects (the ctypes way).  I'm making a claim that the NumPy way of 
using a different Python object to describe data-types.  I'm not saying 
the NumPy object should be used.  I'm saying we should come up with a 
singe DataFormatType whose instances express the data formats in ways 
that other packages can produce and consume (or even use internally).  

It would be easy for NumPy to "use" the default Python object in it's 
PyArray_Descr * structure.  It would also be easy for ctypes to "use" 
the default Python object in its StgDict object that is the tp_dict of 
every ctypes type object.

It would be easy for the struct module to allow for this data-format 
object (instead of just strings) in it's methods. 

It would be easy for the array module to accept this data-format object 
(instead of just typecodes) in it's constructor.

Lot's of things would suddenly be more consistent throughout both the 
Python and C-Python user space.

Perhaps after discussion, it becomes clear that the ctypes approach is 
sufficient to be "that thing" that all modules use to share data-format 
information.  It's definitely expressive enough.   But, my argument is 
that NumPy data-type objects are also "pretty close." so why should they 
be rejected.  We could also make a "string-syntax" do it.

>
> You have said that creating whole classes is too much overhead, and
> the description should only be an instance.  To me, that particular
> class (arrays of 500 structs) still looks pretty lightweight.  So
> please clarify when it starts to be a problem.
>

> (1)  For simple types -- mapping
>   char name[30];  ==> ("name", c_char*30)
>
> Do you object to using the c_char type?
> Do you object to the array-of-length-30 class, instead of just having
> a repeat or shape attribute?
> Do you object to naming the field?
>
> (2)  For the complex types, nested and struct
>
> Do you object to creating these two classes even once?   For example,
> are you expecting to need different classes for each buffer, and to
> have many buffers created quickly?
I object to the way I "consume" and "produce" the ctypes interface.  
It's much to slow to be used on the C-level for sharing many small 
buffers quickly.
>
> Is creating that new class a royal pain, but frequent (and slow)
> enough that you can't just make a call into python (or ctypes)?
>
> (3)  Given that you will describe X, is X*500 (==> a type describing
> an array of 500 Xs) a royal pain in C?  If so, are you expecting to
> have to do it dynamically for many sizes, and quickly enough that you
> can't just let ctypes do it for you?

That pretty much sums it up (plus the pain of having to basically write 
Python code from "C").

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Travis Oliphant
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
>   
>>> Or, if it does have uses independent of the buffer extension: what
>>> are those uses?
>>>   
>> So that NumPy and ctypes and audio libraries and video libraries and 
>> database libraries and image-file format libraries can communicate about 
>> data-formats using the same expressions (in Python).
>> 
>
> I find that puzzling. In what way can the specification of a data type
> enable communication? Don't you need some kind of protocol for it
> (i.e. operations to be invoked)? Also, do you mean that these libraries
> can communicate with each other? Or with somebody else? If so, with
> whom?
>   
What is puzzling?  I've just specified the extended buffer protocol as 
something concrete that data-format objects are shared through.   That's 
on the C-level.  I gave several examples of where such sharing would be 
useful.

Then, I gave examples in Python of how sharing data-formats would also 
be useful so that modules could support the same means to construct 
data-formats (instead of struct using strings, array using typecodes, 
ctypes using it's type-objects, and NumPy using dtype objects).
>   
>> What problem do you have in defining a standard way to communicate about 
>> binary data-formats (not just images)?  I still can't figure out why you 
>> are so resistant to the idea.  MPI had to do it.
>> 
>
> I'm afraid of "dead" specifications, things whose only motivation is
> that they look nice. They are just clutter. There are a few examples
> of this already in Python, like the character buffer interface or
> the multi-segment buffers.
>   
O.K.  I can understand that concern.But, all you do is make struct, 
array, and ctypes support the same data-format specification (by support 
I mean have a way to "consume" and "produce" the data-format object to 
the natural represenation that they have internally) and you are 
guaranteed it won't "die."   In fact, what would be ideal is for the 
PIL, NumPy, CVXOpt, PyMedia, PyGame, pyre, pympi, PyVoxel, etc., etc. 
(there really are many modules that should be able to talk to each other 
more easily) to all support the same data-format representations. Then, 
you don't have to learn everybody's  re-invention of the same concept 
whenever you encounter a new library that does something with binary data.

How much time do you actually spend with binary data (sound, video, 
images, just plain numbers from a scientific experiment) and trying to 
use multiple Python modules to manipulate it?  If you don't spend much 
time, then I can understand why you don't understand the need.
> As for MPI: It didn't just independently define a data types system.
> Instead, it did that, *and* specified the usage of the data types
> in operations such as MPI_SEND. It is very clear what the scope of
> this data description is, and what the intended usage is.
>
> Without specifying an intended usage, it is impossible to evaluate
> whether the specification meets its goals.
>   
What is not understood about the intended usage in the extended buffer 
protocol.  What is not understood about the intended usage of giving the 
array and struct modules a uniform way to represent binary data?
> Ok, that would be a new usage: I expected that datatype instances
> always come in pairs with memory allocated and filled according to
> the description. 
To me that is the most important usage, but it's not the *only* one. 

> If you are proposing to modify/extend the API
> of the struct and array modules, you should say so somewhere (in
> a PEP).
>   
Sure, I understand that.  But, if there is no data-format object, then 
there is no PEP to "extend the struct and array modules" to support it.  
Chicken before the egg, and all that.
> I expect that the primary readers/users of the PEP would be people who
> have to write libraries: i.e. people implementing NumPy, struct, array,
> and people who implement algorithms that operate on data.

Yes, but not only them.  If it's a default way to represent data,  then 
*users* of those libraries that "consume" the representation would also 
benefit by learning a standard.

>  So usability
> of the specification is a matter of how easy it is to *write* a library
> that does perform the image manipulation.
>
>   
>> If you really want to know.  In NumPy it might look like this:
>>
>> Python code:
>>
>> img['r'] = img['g']
>> img['b'] = img['g']
>> 
>
> That's not what I'm asking. Instead, what does the NumPy code look
> like that gets invoked on these read-and-write operations? Does it
> only use the void* pointing to the start of the data, and the
> datatype object? If not, how would C code look like that only has
> the void* and the datatype object?
>
>   
>> dtype = img->descr;
>> 
>
> In this code, is descr a datatype object? ...
>   
Yes.  But, I have a mistake later...
>   
>> r_field = PyDict_GetItemString(dtype,'r');
>> 
Actually it should read PyDict_GetItemString(dtype

Re: [Python-Dev] PEP: Extending the buffer protocol to share array information.

2006-11-01 Thread Travis Oliphant
Fredrik Lundh wrote:
> Chris Barker wrote:
> 
> 
>>While /F suggested we get off the PIL bandwagon
> 
> 
> I suggest we drop the obsession with pointers to memory areas that are 
> supposed to have a specific format; modern data access API:s don't work 
> that way for good reasons, so I don't see why Python should grow a 
> standard based on that kind of model.
> 

Please give us an example of a modern data-access API (i.e. an 
application that uses one)?

I presume you are not fundamentally opposed to sharing memory given the 
example you gave.

> the "right solution" for things like this is an *API* that lets you do 
> things like:
> 
>  view = object.acquire_view(region, supported formats)
>  ... access data in view ...
>  view.release()
> 
> and, for advanced users
> 
>  format = object.query_format(constraints)
> 

It sounds like you are concerned about the memory-area-not-current 
problem.  Yeah, it can be a problem (but not an unsolvable one). 
Objects that share memory through the buffer protcol just have to be 
careful about resizing themselves or eliminating memory.

Anyway, it's a problem not solved by the buffer protocol.  I have no 
problem with trying to fix that in the buffer protocol, either.

It's all completely separate from what I'm talking about as far as I can 
tell.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
> 
>>2) complex-valued types (you might argue that it's just a 2-array of 
>>floats, but you could say the same thing about int as an array of 
>>bytes).  The point is how do people interpret the data.  Complex-valued 
>>data-types are very common.  It is one reason Fortran is still used by 
>>scientists.
> 
> 
> Well, by the same reasoning, you could argue that pixel values (RGBA)
> are missing in the PEP. It's a convenience, sure, and it may also help
> interfacing with the platform's FORTRAN implementation - however, are
> you sure that NumPy's complex layout is consistent with the platform's
> C99 _Complex definition?
> 

I think so (it is on gcc).  And yes, where you draw the line between 
fundamental and "derived" data-type is somewhat arbitrary.  I'd rather 
include complex-numbers than not given their prevalence in the 
data-streams I'm trying to make compatible with each other.

> 
>>3) Unicode characters
>>
>>4) What about floating-point representations that are not IEEE 754 
>>4-byte or 8-byte.
> 
> 
> Both of these are available in a platform-dependent way: if the
> platform uses non-IEEE754 formats for C float and C double, ctypes
> will interface with that just fine. It is actually vice versa:
> IEEE-754 4-byte and 8-byte is not supported in ctypes.

That's what I meant.  The 'f' kind in the data-type description is also 
intended to mean "platform float" whatever that is.  But, a complete 
data-format representation would have a way to describe other 
bit-layouts for floating point representation.  Even if you can't 
actually calculate directly with them without conversion.

> Same for Unicode: the platform's wchar_t is supported (as you said),
> but not a platform-independent (say) 4-byte little-endian.

Right.

It's a matter of scope.  Frankly, I'd be happy enough to start with 
"typecodes" in the extended buffer protocol (that's where the array 
module is now) and then move up to something more complete later.

But, since we already have an array interface for record-arrays to share 
information and data with each other, and ctypes showing all of it's 
power, then why not be more complete?



-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Extending the buffer protocol to share array information.

2006-11-01 Thread Travis Oliphant
Fredrik Lundh wrote:
> Chris Barker wrote:
> 
> 
>>While /F suggested we get off the PIL bandwagon
> 
> 
> I suggest we drop the obsession with pointers to memory areas that are 
> supposed to have a specific format; modern data access API:s don't work 
> that way for good reasons, so I don't see why Python should grow a 
> standard based on that kind of model.
> 
> the "right solution" for things like this is an *API* that lets you do 
> things like:
> 
>  view = object.acquire_view(region, supported formats)
>  ... access data in view ...
>  view.release()
> 
> and, for advanced users
> 
>  format = object.query_format(constraints)

So, if the extended buffer protocol were enhanced to enforce this kind 
of viewing and release, then would you support it?

Basically, the extended buffer protocol would at the same time as 
providing *more* information about the "view" require the implementer to 
undertand the idea of "holding" and "releasing" the view.

Would this basically require the object supporting the extended buffer 
protocol to keep some kind of list of who has views (or at least a 
number indicating how many views there are)?


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Alexander Belopolsky wrote:
> Travis Oliphant  ieee.org> writes:
> 
> 
>>>>b = buffer(array('d', [1,2,3]))
> 
> 
> there is not much that I can do with b.  For example, if I want to pass it to
> numpy, I will have to provide the type and shape information myself:
> 
> 
>>>>numpy.ndarray(shape=(3,), dtype=float, buffer=b)
> 
> array([ 1.,  2.,  3.])
> 
> With the extended buffer protocol, I should be able to do
> 
> 
>>>>numpy.array(b)

or just

numpy.array(array.array('d',[1,2,3]))

and leave-out the buffer object all together.


> 
> 
> So let's start by solving this problem and limit it to data that can be found
> in a standard library array.  This way we can postpone the discussion of 
> shapes,
> strides and nested structs.

Don't lump those ideas together.  Shapes and strides are necessary for 
N-dimensional array's (it's essentially what *defines* the N-dimensional 
array).   I really don't want to sacrifice those in the extended buffer 
protocol.  If you want to separate them into different functions then 
that is a possibility.

> 
> If we manage to agree on the standard way to pass primitive type information,
> it will be a big achievement and immediately useful because simple arrays are
> already in the standard library.
> 

We could start there, I suppose.  Especially if it helps us all get on 
the same page.  But, we already see the applications beyond this simple 
case so I would like to have at least an "eye" for the more difficult 
case which we already have a working solution for in the "array interface"

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Paul Moore wrote:
> 
> 
> Enough of the abstract. As a concrete example, suppose I have a (byte)
> string in my program containing some binary data - an ID3 header, or a
> TCP packet, or whatever. It doesn't really matter. Does your proposal
> offer anything to me in how I might manipulate that data (assuming I'm
> not using NumPy)? (I'm not insisting that it should, I'm just trying
> to understand the scope of the PEP).
> 

What do you mean by "manipulate the data."  The proposal for a 
data-format object would help you describe that data in a standard way 
and therefore share that data between several library that would be able 
to understand the data (because they all use and/or understand the 
default Python way to handle data-formats).

It would be up to the other packages to "manipulate" the data.

So, what you would be able to do is take your byte-string and create a 
buffer object which you could then share with other packages:

Example:

b = buffer(bytestr, format=data_format_object)

Now.

a = numpy.frombuffer(b)
a['field1']  # prints data stored in the field named "field1"

etc.

Or.

cobj = ctypes.frombuffer(b)

# Now, cobj is a ctypes object that is basically a "structure" that can 
be passed # directly to your C-code.

Does this help?

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-02 Thread Travis Oliphant
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
> 
>>>2. Should primitive type codes be characters or integers (from an enum) at
>>>C level?
>>>- I prefer integers
>>
>>>3. Should size be expressed in bits or bytes?
>>>- I prefer bits
>>>
>>
>>So, you want an integer enum for the "kind" and an integer for the 
>>bitsize?   That's fine with me.
>>
>>One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
>>in structmember.h already.  Should we just re-use those #defines while 
>>adding to them to make an easy to use interface for primitive types?
> 
> 
> Notice that those type codes imply sizes, namely the platform sizes
> (where "platform" always means "what the C compiler does"). So if
> you want to have platform-independent codes as well, you shouldn't
> use the T_ codes.
> 

In NumPy we've found it convenient to use both.   Basically, we've set 
up a header file that "does the translation" using #defines and typedefs 
to create things like (on a 32-bit platform)

typedef npy_int32  int
#define NPY_INT32 NPY_INT

So, that either the T_code-like enum or the bit-width can be used 
interchangable.

Typically people want to specify bit-widths (and see their data-types in 
bit-widths) but in C-code that implements something you need to use one 
of the platform integers.

I don't know if we really need to bring all of that over.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-02 Thread Travis Oliphant
T
> 
> IIUC, so far the 'data-object' carries information about the structure
> of the data it describes.
> 
> Couldn't it go a step further and have also some functionality?
> Converting the data into a Python object and back?
>

Yes, I had considered it to do that.  That's why the setfunc and getfunc 
functions were written the way they were.

-teo

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-03 Thread Travis Oliphant

>
> Perhaps the most relevant thing to pull from this conversation is back 
> to what Martin has asked about before: "flexible array members". A TCP 
> packet has no defined length (there isn't even a header field in the 
> packet for this, so in fairness we can talk about IP packets which 
> do). There is no way for me to describe this with the pre-PEP 
> data-formats.
>
> I feel like it is misleading of you to say "it's up to the package to 
> do manipulations," because you glanced over the fact that you can't 
> even describe this type of data. ISTM, that you're only interested in 
> describing repetitious fixed-structure arrays. 
Yes, that's right.  I'm only interested in describing binary data with a 
fixed length.  Others can help push it farther than that (if they even 
care).

> If we are going to have a "default Python way to handle data-formats", 
> then don't you feel like this falls short of the mark?
Not for me.   We can fix what needs fixing, but not if we can't get out 
of the gate.
>
> I fear that you speak about this in too grandiose terms and are now 
> trapped by people asking, "well, can I do this?" I think for a lot of 
> folks the answer is: "nope." With respect to the network packets, this 
> PEP doesn't do anything to fix the communication barrier.

Yes it could if you were interested in pushing it there.   No, I didn't 
solve that particular problem with the PEP (because I can only solve the 
problems I'm aware of), but I do think the problem could be solved.   We 
have far too many nay-sayers on this list, I think.

Right now, I don't have time to push this further.  My real interest is 
the extended buffer protocol.  I want something that works for that.  
When I do have time again to discuss it again, I might come back and 
push some more. 

But, not now.

-Travis



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-03 Thread Travis Oliphant
Martin v. Löwis wrote:

>Travis Oliphant schrieb:
>  
>
>>>>r_field = PyDict_GetItemString(dtype,'r');
>>>>
>>>>
>>>>
>>Actually it should read PyDict_GetItemString(dtype->fields).The
>>r_field is a tuple (data-type object, offset).  The fields attribute is
>>(currently) a Python dictionary.
>>
>>
>
>Ok. This seems to be missing in the PEP. 
>
Yeah,  actually quite a bit is missing.  Because I wanted to float the 
idea for discussion before "getting the details perfect"  (which of 
course they wouldn't be if it was just my input producing them).

>In this code, where is PyArray_GetField coming from?
>
This is a NumPy Specific C-API.That's why I was confused about why 
you wanted me to show how I would do it. 

But, what you are actually asking is how would another application use 
the data-type information to do the same thing using the data-type 
object and a pointer to memory.  Is that correct?

This is a reasonable thing to request.  And your example is a good one.  
I will use the PEP to explain it.

Ultimately, the code you are asking for will have to have some kind of 
dispatch table for different binary code depending on the actual 
data-types being shared (unless all that is needed is a copy in which 
case just the size of the element area can be used).  In my experience, 
the dispatch table must be present for at least the "simple" 
data-types.  The data-types built up from there can depend on those.

In NumPy, the data-type objects have function pointers to accomplish all 
the things NumPy does quickly.  So, each data-type object in NumPy 
points to a function-pointer table and the NumPy code defers to it to 
actually accomplish the task (much like Python really).

Not all libraries will support working with all data-types.  If they 
don't support it, they just raise an error indicating that it's not 
possible to share that kind of data. 

> What does
>it do? If I wanted to write this code from scratch, what
>should I write instead? Since this is all about a flat
>memory block, I'm surprised I need "true" Python objects
>for the field values in there.
>  
>
Well, actually, the block could be "strided" as well. 

So, you would write something that gets the pointer to the memory and 
then gets the extended information (dimensionality, shape, and strides, 
and data-format object).  Then, you would get the offset of the field 
you are interested in from the start of the element (it's stored in the 
data-format representation).

Then do a memory copy from the right place (using the array iterator in 
NumPy you can actually do it without getting the shape and strides 
information first but I'm holding off on that PEP until an N-d array is 
proposed for Python).   I'll write something like that as an example and 
put it in the PEP for the extended buffer protocol.  

-Travis




>  
>
>>But, the other option (especially for code already written) would be to
>>just convert the data-format specification into it's own internal
>>representation.
>>
>>
>
>Ok, so your assumption is that consumers already have their own
>machinery, in which case ease-of-use would be the question how
>difficult it is to convert datatype objects into the internal
>representation.
>
>Regards,
>Martin
>  
>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Extended Buffer Interface/Protocol

2007-03-21 Thread Travis Oliphant

I'm soliciting feedback on my extended buffer protocol that I am 
proposing for inclusion in Python 3000.  As soon as the Python 3.0 
implementation is complete, I plan to back-port the result to Python 
2.6, therefore, I think there may be some interest on this list.

Basically, the extended buffer protocol seeks to allow memory sharing with

1) information about what is "in" the memory (float, int, C-structure, etc.)
2) information about the "shape" of the memory (if any)

3) information about discontiguous memory segments


Number 3 is where I could use feedback --- especially from PIL users and 
developers.   Strides are a common way to think about a possibly 
discontiguous chunk of memory (which appear in NumPy when you select a 
sub-region from a larger array). The strides vector tells you how many 
bytes to skip in each dimension to get to the next memory location for 
that dimension.

Because NumPy uses this memory model as do several compute libraries 
(like BLAS and LAPACK), it makes sense to allow this memory model to be 
shared between objects in Python.

Currently, the proposed buffer interface eliminates the multi-segment 
option (for Python 3.0) which I think was originally put in place 
because of the PIL.   However, I don't know if it is actually used by 
any extension types.  This is a big reason why Guido wants to drop the 
multi-segment interface option.

The question is should we eliminate the possibility of sharing memory 
for objects that store data basically as "arrays" of arrays (i.e. true 
C-style arrays).  That is what I'm currently proposing, but I could also 
see an argument that states that if we are going to support strided 
memory access, we should also support array of array memory access.

If this is added, then it would be another function-call that gets a 
array-of-array-style memory from the object.  What do others think of 
these ideas?


One possible C-API call that Python could grow with the current buffer 
interface is to allow contiguous-memory mirroring of discontiguous 
memory, or an iterator object that iterates through every element of any 
object that exposes the buffer protocol.


Thanks for any feedback,

-Travis Oliphant



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-03-21 Thread Travis Oliphant

Attached is the PEP.

:PEP: XXX
:Title: Revising the buffer protocol
:Version: $Revision: $
:Last-Modified: $Date:  $
:Author: Travis Oliphant <[EMAIL PROTECTED]>
:Status: Draft
:Type: Standards Track
:Content-Type: text/x-rst
:Created: 28-Aug-2006
:Python-Version: 3000

Abstract


This PEP proposes re-designing the buffer API (PyBufferProcs
function pointers) to improve the way Python allows memory sharing
in Python 3.0

In particular, it is proposed that the multiple-segment and
character buffer portions of the buffer API be eliminated and
additional function pointers be provided to allow sharing any
multi-dimensional nature of the memory and what data-format the
memory contains.

Rationale
=

The buffer protocol allows different Python types to exchange a
pointer to a sequence of internal buffers.  This functionality is
*extremely* useful for sharing large segments of memory between
different high-level objects, but it is too limited and has issues.

1. There is the little (never?) used "sequence-of-segments" option
   (bf_getsegcount)

2. There is the apparently redundant character-buffer option
   (bf_getcharbuffer)

3. There is no way for a consumer to tell the buffer-API-exporting
   object it is "finished" with its view of the memory and
   therefore no way for the exporting object to be sure that it is
   safe to reallocate the pointer to the memory that it owns (for
   example, the array object reallocating its memory after sharing
   it with the buffer object which held the original pointer led
   to the infamous buffer-object problem).

4. Memory is just a pointer with a length. There is no way to
   describe what is "in" the memory (float, int, C-structure, etc.)

5. There is no shape information provided for the memory.  But,
   several array-like Python types could make use of a standard
   way to describe the shape-interpretation of the memory
   (wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video
   Libraries, ctypes, NumPy, data-base interfaces, etc.)

6. There is no way to share discontiguous memory (except through
   the sequence of segments notion).  

   There are two widely used libraries that use the concept of
   discontiguous memory: PIL and NumPy.  Their view of discontiguous
   arrays is different, though.  This buffer interface allows
   sharing of either memory model.  Exporters will only use one 
   approach and consumers may choose to support discontiguous 
   arrays of each type however they choose. 

   NumPy uses the notion of constant striding in each dimension as its
   basic concept of an array. With this concept, a simple sub-region
   of a larger array can be described without copying the data.   T
   Thus, stride information is the additional information that must be
   shared. 

   The PIL uses a more opaque memory representation. Sometimes an
   image is contained in a contiguous segment of memory, but sometimes
   it is contained in an array of pointers to the contiguous segments
   (usually lines) of the image.  The PIL is where the idea of multiple
   buffer segments in the original buffer interface came from. 
  

   NumPy's strided memory model is used more often in computational
   libraries and because it is so simple it makes sense to support
   memory sharing using this model.  The PIL memory model is used often
   in C-code where a 2-d array can be then accessed using double
   pointer indirection:  e.g. image[i][j].  

   The buffer interface should allow the object to export either of these
   memory models.  Consumers are free to either require contiguous memory
   or write code to handle either memory model.  

Proposal Overview
=

* Eliminate the char-buffer and multiple-segment sections of the
  buffer-protocol.

* Unify the read/write versions of getting the buffer.

* Add a new function to the interface that should be called when
  the consumer object is "done" with the view.

* Add a new variable to allow the interface to describe what is in
  memory (unifying what is currently done now in struct and
  array)

* Add a new variable to allow the protocol to share shape information

* Add a new variable for sharing stride information

* Add a new mechanism for sharing array of arrays. 

* Fix all objects in the core and the standard library to conform
  to the new interface

* Extend the struct module to handle more format specifiers

Specification
=

Change the PyBufferProcs structure to

::

typedef struct {
 getbufferproc bf_getbuffer
 releasebufferproc bf_releasebuffer
}

::

typedef PyObject *(*getbufferproc)(PyObject *obj, void **buf,
   Py_ssize_t *len, int *writeable,
   char **format, int *ndims,
   Py_ssize_t **shape,
   Py_ssize_t **strides,
   

Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-03-21 Thread Travis Oliphant
Greg Ewing wrote:
> Travis Oliphant wrote:
> 
> 
>>The question is should we eliminate the possibility of sharing memory 
>>for objects that store data basically as "arrays" of arrays (i.e. true 
>>C-style arrays).
> 
> 
> Can you clarify what you mean by this? Are you talking
> about an array of pointers to other arrays? (This is
> not what I would call an "array of arrays", even in C.)

I'm talking about arrays of pointers to other arrays:

i.e. if somebody defined in C

float B[10][20]

then B would B an array of pointers to arrays of floats.

> 
> Supporting this kind of thing could be a slippery slope,
> since there can be arbitrary levels of complexity to
> such a structure. E.g do you support a 1d array of
> pointers to 3d arrays of pointers to 2d arrays? Etc.
> 

Yes, I saw that.  But, it could actually be supported, in general.
The shape information is available.  If a 3-d array is meant then ndims
is 3 and you would re-cast the returned pointer appropriately.

In other words, suppose that instead of strides you can request a 
variable through the buffer interface with type void **segments.

Then, by passing the address to a void * variable to the routine you 
would receive the array.  Then, you could handle 1-d, 2-d, and 3-d cases 
using something like this:

This is pseudocode:

void *segments;
int ndims;
Py_ssize_t *shape;
char *format;


(&ndims, &shape, &format, and &segments) are passed to the buffer 
interface.

if strcmp(format, "f") != 0
 raise an error.

if (ndims == 1)

var = (float *)segments
for (i=0; i The more different kinds of format you support, the less
> likely it becomes that the thing consuming the data
> will be willing to go to the trouble required to
> understand it.

That is certainly true.   I'm really only going through the trouble, 
since the multiple segment already exists and the PIL has this memory 
model (although I have not heard PIL developers clamoring for support, 
--- I'm just being sensitive to that extension type).

> 
> 
>>One possible C-API call that Python could grow with the current buffer 
>>interface is to allow contiguous-memory mirroring of discontiguous 
>>memory,
> 
> 
> I don't think the buffer protocol itself should incorporate
> anything that requires implicitly copying the data, since
> the whole purpose of it is to provide direct access to the
> data without need for copying.

No, this would not be the buffer protocol, but merely a C-API that would 
use the buffer protocol - i.e. it is just a utility function as you mention.

> 
> It would be okay to supply some utility functions for
> re-packing data, though.
> 
> 
>>or an iterator object that iterates through every element of any 
>>object that exposes the buffer protocol.
> 
> 
> Again, for efficiency reasons I wouldn't like to involve
> Python objects and iteration mechanisms in this. 

I was thinking more of a C-iterator, like NumPy provides.  This can be 
very efficient (as long as the loop is not in Python).

It sure provides a nice abstraction that lets you deal with 
discontiguous arrays as if they were contiguous, though.

The
> buffer interface is meant to give you raw access to the
> data at raw C speeds. Anything else is outside its scope,

Sure.  These things are just ideas about *future* utility functions that 
might make use of the buffer interface and motivate its design.

Thanks for your comments.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-03-22 Thread Travis Oliphant
Greg Ewing wrote:
> Travis Oliphant wrote:
> 
> 
>>I'm talking about arrays of pointers to other arrays:
>>
>>i.e. if somebody defined in C
>>
>>float B[10][20]
>>
>>then B would B an array of pointers to arrays of floats.
> 
> 
> No, it wouldn't, it would be a contiguously stored
> 2-dimensional array of floats. An array of pointers
> would be
> 
>float *B[10];
> 
> followed by code to allocate 10 arrays of 20 floats
> each and initialise B to point to them.
> 

You are right, of course, that example was not correct.  I think the 
point is still valid, though.   One could still use the shape to 
indicate how many levels of pointers-to-pointers there are (i.e. how 
many pointer dereferences are needed to select out an element).  Further 
dimensionality could then be reported in the format string.

This would not be hard to allow.  It also would not be hard to write a 
utility function to copy such shared memory into a contiguous segment to 
provide a C-API that allows casual users to avoid the details of memory 
layout when they are writing an algorithm that just uses the memory.

> I can imagine cases like that coming up in practice.
> For example, an image object might store its data
> as four blocks of memory for R, G, B and A planes,
> each of which is a contiguous 2d array with shape
> and stride -- but you want to view it as a 3d
> array byte[plane][x][y].

All we can do is have the interface actually be able to describe it's 
data.  Users would have to take that information and write code 
accordingly.

In this case, for example, one possibility is that the object would 
raise an error if strides were requested.  It would also raise an error 
if contiguous data was requested (or I guess it could report the R 
channel only if it wanted to).   Only if segments were requested could 
it return an array of pointers to the four memory blocks.  It could then 
report itself as a 2-d array of shape (4, H)  where H is the height. 
Each element of the array would be reported as "%sB" % W where W is the 
width of the image (i.e. each element of the 2-d array would be a 1-d 
array of length W.

Alternatively it could report itself as a 1-d array of shape (4,) with 
elements "(H,W)B"

A user would have to write the algorithm correctly in order to access 
the memory correctly.

Alternatively, a utility function that copies into a contiguous buffer 
would allow the consumer to "not care" about exactly how the memory is 
layed out.  But, the buffer interface would allow the utility function 
to figure it out and do the right thing for each exporter.  This 
flexibility would not be available if we don't allow for segmented 
memory in the buffer interface.

So, I don't think it's that hard to at least allow the multiple-segment 
idea into the buffer interface (as long as all the segments are the same 
size, mind you).  It's only one more argument to the getbuffer call.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-03-26 Thread Travis Oliphant
Carl Banks wrote:
> We're done.  Return pointer.

Thank you for this detailed example.  I will have to parse it in more 
depth but I think I can see what you are suggesting.

> 
> First, I'm not sure why getbuffer needs to return a view object. 

The view object in your case would just be the ImageObject.  The reason 
I was thinking the function should return "something" is to provide more 
flexibility in what a view object actually is.

I've also been going back and forth between explicitly passing all this 
information around or placing it in an actual view-object.  In some 
sense, a view object is a NumPy array in my mind.  But, with the 
addition of isptr we are actually expanding the memory abstraction of 
the view object beyond an explicit NumPy array.

In the most common case, I envisioned the view object would just be the 
object itself in which case it doesn't actually have to be returned. 
While returning the view object would allow unspecified flexibilty in 
the future, it really adds nothing to the current vision.

We could add a view object separately as an abstract API on top of the 
buffer interface.

> 
> 
> Second question: what happens if a view wants to re-export the buffer? 
> Do the views of the buffer ever change?  Example, say you create a 
> transposed view of a Numpy array.  Now you want a slice of the 
> transposed array.  What does the transposed view's getbuffer export?

Basically, you could not alter the internal representation of the object 
while views which depended on those values were around.

In NumPy, a transposed array actually creates a new NumPy object that 
refers to the same data but has its own shape and strides arrays.

With the new buffer protocol, the NumPy array would not be able to alter 
it's shape/strides/or reallocate its data areas while views were being 
held by other objects.

With the shape and strides information, the format information, and the 
data buffer itself, there are actually several pieces of memory that may 
need to be protected because they may be shared with other objects. 
This makes me wonder if releasebuffer should contain an argument which 
states whether or not to release the memory, the shape and strides 
information, the format information, or all of it.

Having such a thing as a view object would actually be nice because it 
could hold on to a particular view of data with a given set of shape and 
strides (whose memory it owns itself) and then the exporting object 
would be free to change it's shape/strides information as long as the 
data did not change.

> 
> The reason I ask is: if things like "buf" and "strides" and "shape" 
> could change when a buffer is re-exported, then it can complicate things 
> for PIL-like buffers.  (How would you account for offsets in a dimension 
> that's in a subarray?)

I'm not sure what you mean, offsets are handled by changing the starting 
location of the pointer to the buffer.

-Travis
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] An updated extended buffer PEP

2007-03-26 Thread Travis Oliphant

Hi Carl and Greg,

Here is my updated PEP which incorporates several parts of the 
discussions we have been having. 

-Travis


 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] An updated extended buffer PEP

2007-03-26 Thread Travis Oliphant
Travis Oliphant wrote:
> Hi Carl and Greg,
> 
> Here is my updated PEP which incorporates several parts of the 
> discussions we have been having. 
> 

And here is the actual link:

http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/doc/pep_buffer.txt




> -Travis
> 
> 
>  
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org
> 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] An updated extended buffer PEP

2007-03-27 Thread Travis Oliphant
Lisandro Dalcin wrote:
> On 3/26/07, Travis Oliphant <[EMAIL PROTECTED]> wrote:
>> Here is my updated PEP which incorporates several parts of the
>> discussions we have been having.
>
> Travis, it looks really good, below my comments
I hope you don't mind me replying to python-dev.

>
> 1- Is it hard to EXTEND PyBufferProcs in order to be able to use all
> this machinery in Py 2.X series, not having to wait until Py3k?

No, I don't think it will be hard.  I just wanted to focus on Py3k since 
it is going to happen before Python 2.6 and I wanted it discussed in 
that world.
>
> 2- Its not clear for me if this PEP will enable object types defined
> in the Python side to export buffer info. This is a feature I really
> like in numpy, and simplifies my life a lot when I need to export
> memory for C/C++ object wrapped with the help of tools like SWIG.
This PEP does not address that.  You will have to rely on the objects 
themselves for any such information.
>
> 3- Why not to  constraint the returned 'view' object to be of a
> specific type defined in the C side (and perhaps available in the
> Python side)? This 'view' object could maintain a reference to the
> base object containing the data, could call releasebuffer using the
> base object when the view object is decref'ed, and can have a flag
> field for think like OWN_MEMORY, OWN_SHAPE, etc in order to properly
> manage memory deallocation. Does all this make sense?

Yes, that was my original thinking and we are kind of coming back to it 
after several iterations.   Perhaps, though we can stick with an 
object-less buffer interface but have this "view object" as an expanded 
buffer object.

-Travis



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] An updated extended buffer PEP

2007-03-27 Thread Travis Oliphant
Carl Banks wrote:
> Travis E. Oliphant wrote:
>> I think we are getting closer.   What do you think about Greg's idea 
>> of basically making the provider the bufferinfo structure and having 
>> the exporter handle copying memory over for shape and strides if it 
>> wants to be able to change those before the lock is released.
>
> It seems like it's just a different way to return the data.  You could 
> do it by setting values through pointers, or do it by returning a 
> structure.  Which way you choose is a minor detail in my opinion.  I'd 
> probably favor returning the information in a structure.
>
> I would consider adding two fields to the structure:
>
> size_t structsize; /* size of the structure */
Why is this necessary?  can't you get that by sizeof(bufferinfo)?

> PyObject* releaser; /* the object you need to call releasebuffer on */ 
Is this so that another object could be used to manage releases if desired?

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Extended buffer PEP

2007-04-08 Thread Travis Oliphant
Carl Banks wrote:
> Only one concern:
>
> > typedef int (*getbufferproc)(PyObject *obj, struct bufferinfo 
> *view)
>
>
> I'd like to see it accept a flags argument over what kind of buffer 
> it's allowed to return.  I'd rather not burden the user to check all 
> the entries in bufferinfo to make sure it doesn't get something 
> unexpected.
Yes, I agree. We had something like that at one point. 
>
> I imagine most uses of buffer protocol would be for direct, 
> one-dimensional arrays of bytes with no striding.  It's not clear 
> whether read-only or read-write should be the least common 
> denominator, so require at least one of these flags:
>
> Py_BUF_READONLY
> Py_PUF_READWRITE
>
> Then allow any of these flags to allow more complex access:
>
> Py_BUF_MULTIDIM - allows strided and multidimensional arrays
> Py_BUF_INDIRECT - allows indirect buffers (implies Py_BUF_MULTIDIM)
>
> An object is allowed to return a simpler array than requested, but not 
> more complex.  If you allow indirect buffers, you might still get a 
> one-dimensional array of bytes.
>
>
> Other than that, I would add a note about the other things considered 
> and rejected (the old prototype for getbufferproc, the delegated 
> buffer object).  List whether to backport the buffer protocol to 2.6 
> as an open question.

Thanks for the suggestions.
>
> Then submit it as a real PEP.  I believe this idea has run its course 
> as PEP XXX and needs a real number.  

How does one do that.   Who assigns the number?  I thought I "had" 
submitted it as a real PEP.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Extended Buffer Protocol - simple use examples

2007-04-09 Thread Travis Oliphant
Paul Moore wrote:
> Hi,
> I'll admit right off that I haven't followed all of the extended
> buffer protocol discussions - I have no real need for anything much
> beyond the existing "here's a blob of memory" level of functionality.
> 
> I have skimmed (briefly, I'll admit!) the pre-PEP, but I've found it
> extremely difficult to find a simple example of the basic (in my view)
> use case of an undifferentiated block of bytes.
> 

This is a great suggestion and it was on my to-do list.  I've included 
some examples of this use-case in the new PEP.

> 
> 1. (Producer) I have a block of memory in my C extension and I want to
> expose it as a simple contiguous block of bytes to Python.

This is now Ex. 2 in the PEP.

> 
> 2. (Consumer) I want to get at a block of memory exposed as a buffer.
> I am only interested in, and only support, viewing a buffer as a block
> of contiguous bytes. I expect most if not all extensions to be able to
> provide such a view.
> 

This is now Ex. 3

Thanks for the suggestions.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-09 Thread Travis Oliphant



Changes:


 * added the "flags" variable to allow simpler calling for getbuffer.

 * added some explanation of ideas that were discussed and abandoned.

 * added examples for simple use cases.

 * added more C-API calls to allow easier usage.


Thanks for all feedback.

-Travis

PEP: 3118
Title: Revising the buffer protocol
Version: $Revision$
Last-Modified: $Date$
Authors: Travis Oliphant <[EMAIL PROTECTED]>, Carl Banks <[EMAIL PROTECTED]>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 28-Aug-2006
Python-Version: 3000

Abstract


This PEP proposes re-designing the buffer interface (PyBufferProcs
function pointers) to improve the way Python allows memory sharing
in Python 3.0

In particular, it is proposed that the character buffer portion 
of the API be elminated and the multiple-segment portion be 
re-designed in conjunction with allowing for strided memory
to be shared.   In addition, the new buffer interface will 
allow the sharing of any multi-dimensional nature of the
memory and what data-format the memory contains. 

This interface will allow any extension module to either 
create objects that share memory or create algorithms that
use and manipulate raw memory from arbitrary objects that 
export the interface. 


Rationale
=

The Python 2.X buffer protocol allows different Python types to
exchange a pointer to a sequence of internal buffers.  This
functionality is *extremely* useful for sharing large segments of
memory between different high-level objects, but it is too limited and
has issues:

1. There is the little used "sequence-of-segments" option
   (bf_getsegcount) that is not well motivated. 

2. There is the apparently redundant character-buffer option
   (bf_getcharbuffer)

3. There is no way for a consumer to tell the buffer-API-exporting
   object it is "finished" with its view of the memory and
   therefore no way for the exporting object to be sure that it is
   safe to reallocate the pointer to the memory that it owns (for
   example, the array object reallocating its memory after sharing
   it with the buffer object which held the original pointer led
   to the infamous buffer-object problem).

4. Memory is just a pointer with a length. There is no way to
   describe what is "in" the memory (float, int, C-structure, etc.)

5. There is no shape information provided for the memory.  But,
   several array-like Python types could make use of a standard
   way to describe the shape-interpretation of the memory
   (wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video
   Libraries, ctypes, NumPy, data-base interfaces, etc.)

6. There is no way to share discontiguous memory (except through
   the sequence of segments notion).  

   There are two widely used libraries that use the concept of
   discontiguous memory: PIL and NumPy.  Their view of discontiguous
   arrays is different, though.  The proposed buffer interface allows
   sharing of either memory model.  Exporters will use only one
   approach and consumers may choose to support discontiguous 
   arrays of each type however they choose. 

   NumPy uses the notion of constant striding in each dimension as its
   basic concept of an array. With this concept, a simple sub-region
   of a larger array can be described without copying the data.   T
   Thus, stride information is the additional information that must be
   shared. 

   The PIL uses a more opaque memory representation. Sometimes an
   image is contained in a contiguous segment of memory, but sometimes
   it is contained in an array of pointers to the contiguous segments
   (usually lines) of the image.  The PIL is where the idea of multiple
   buffer segments in the original buffer interface came from.   

   NumPy's strided memory model is used more often in computational
   libraries and because it is so simple it makes sense to support
   memory sharing using this model.  The PIL memory model is sometimes 
   used in C-code where a 2-d array can be then accessed using double
   pointer indirection:  e.g. image[i][j].  

   The buffer interface should allow the object to export either of these
   memory models.  Consumers are free to either require contiguous memory
   or write code to handle one or both of these memory models. 

Proposal Overview
=

* Eliminate the char-buffer and multiple-segment sections of the
  buffer-protocol.

* Unify the read/write versions of getting the buffer.

* Add a new function to the interface that should be called when
  the consumer object is "done" with the memory area.  

* Add a new variable to allow the interface to describe what is in
  memory (unifying what is currently done now in struct and
  array)

* Add a new variable to allow the protocol to share shape information

* Add a new variable for sharing stride information

* Add a new mechanism for sharing arrays that must 
  be accessed using po

Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-09 Thread Travis Oliphant
Carl Banks wrote:
>
>
> Travis Oliphant wrote:
> > Py_BUF_READONLY
> >The returned buffer must be readonly and the underlying object 
> should make
> >its memory readonly if that is possible.
>
> I don't like the "if possible" thing.  If it makes no guarantees, it 
> pretty much useless over Py_BUF_SIMPLE.
O.K.  Let's make it raise an error if it can't set it read-only.

>> Py_BUF_FORMAT
>>The consumer will be using the format string information so make 
>> sure thatmember is filled correctly. 
>
> Is the idea to throw an exception if there's some other data format 
> besides "b", and this flag isn't set?  It seems superfluous otherwise.

The idea is that a consumer may not care about the format and the 
exporter may want to know that to simplify the interface.In other 
words the flag is a way for the consumer to communicate that it wants 
format information (or not). 

If the exporter wants to raise an exception if the format is not 
requested is up to the exporter.

>> Py_BUF_SHAPE
>>The consumer can (and might) make use of using the ndims and shape 
>> members of the structure
>>so make sure they are filled in correctly.Py_BUF_STRIDES 
>> (implies SHAPE)
>>The consumer can (and might) make use of the strides member of the 
>> structure (as well
>>as ndims and shape)
>
> Is there any reasonable benefit for allowing Py_BUF_SHAPE without 
> Py_BUF_STRIDES?  Would the array be C- or Fortran-like?

Yes,  I could see a consumer not being able to handle simple striding 
but could handle shape information.  Many users of NumPy arrays like to 
think of the array as an N-d array but want to ignore striding.

I've made the changes in numpy's SVN.   Hopefully they will get mirrored 
over to the python PEP directory eventually.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-11 Thread Travis Oliphant
Greg Ewing wrote:
> From PEP 3118:
>
>   A memory-view object is an extended buffer object that
>   should replace the buffer object in Python 3K.
>
>   typedef struct {
> PyObject_HEAD
> PyObject *base;
> struct bufferinfo view;
> int itemsize;
> int flags;
>   } PyMemoryViewObject;
>
> If the purpose is to provide Python-level access to an
> object via its buffer interface, then keeping a bufferinfo
> struct in it is the wrong implementation strategy, since it
> implies keeping the base object's memory locked as long as
> the view object exists.

Yes, but that was the intention.   The MemoryView Object is basically an 
N-d array object. 
>
> That was the mistake made by the original buffer object,
> and the solution is not to hold onto the info returned by
> the base object's buffer interface, but to make a new
> buffer request for each Python-level access.
I could see this approach also, but if we went this way then the memory 
view object should hold "slice" information so that it can be a "sliced" 
view of a memory area.

Because slicing NumPy array's already does it by holding on to a view, I 
guess having an object that doesn't hold on to a view in Python but 
"re-gets" it every time it is needed, would be useful. 

In that case:

typedef struct {
PyObject_HEAD
PyObject *base;
int ndims;
PyObject **slices;  /* or 3 Py_ssize_t arrays */
int flags;
} PyMemoryViewObject;

would be enough to store, I suppose.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-12 Thread Travis Oliphant
Neil Hodgson wrote:
> Travis Oliphant:
>
>> PEP: 3118
>> ...
>
>   I'd like to see the PEP include discussion of what to do when an
> incompatible request is received while locked. Should there be a
> standard "Can't do that: my buffer has been got" exception?
I'm not sure what standard to make a decision about that by.  Sure, why 
not?

It's not something I'd considered. 

-Travis

___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-12 Thread Travis Oliphant
Lisandro Dalcin wrote:
> On 4/9/07, Travis Oliphant <[EMAIL PROTECTED]> wrote:
>
> Travis, all this is far better and simpler than previous approaches...
> Just a few comments

Thanks for your wonderful suggestions.  I've incorporated many of them.
>
> 1) I assume that 'bufferinfo' structure will be part of public Python
> API. In such a case, I think it should be renamed and prefixed.
> Something like 'PyBufferInfo' sounds good?

I prefer that as well. 

>
> 2) I also think 'bufferinfo' could also have an 'itemsize' field
> filled if Py_BUF_ITEMSIZE flag is passed. What do you think? Exporters
> can possibly fill this field more efficiently than next parsing
> 'format' string, it can also save consumers from an API call.
I think the itemsize member is a good idea.   I'm re-visiting what the 
flags should be after suggestions by Carl.
>
> 3) It does make sense to make 'format' be 'const char *' ?
Yes,
>
> 4) I am not sure about this, but perhaps 'buferingo' should save the
> flags passed to 'getbuffer' in a 'flags' field. This can be possibly
> needed at 'releasebuffer' call.
>
I think this is un-necessary.
>
>>   typedef struct {
>>   PyObject_HEAD
>>   PyObject *base;
>>   struct bufferinfo view;
>>   int itemsize;
>>   int flags;
>>   } PyMemoryViewObject;
>
> 5) If my previous comments go in, so 'PyMemoryViewObject' will not
> need 'itemsize' and 'flags' fields (they are in 'bufferinfo'
> structure).
>
After suggestions by Greg, I like the idea of the PyMemoryViewObject 
holding a pointer to another object (from which it gets memory on 
request) as well as information about a slice of that memory. 

Thus, the memory view object is something like:

typedef struct {
  PyObject_HEAD
  PyObject *base; 
  int ndims;
  Py_ssize_t *offsets;/* slice starts */
  Py_ssize_t *lengths;   /* slice stops */
  Py_ssize_t *skips;   /* slice steps */
} PyMemoryViewObject;

It is more convenient to store any slicing information (so a memory view 
object could store an arbitrary slice of another object) as offsets, 
lengths, and skips which can be used to adjust the memory buffer 
returned by base.

>> int PyObject_GetContiguous(PyObject *obj, void **buf, Py_ssize_t 
>> *len,
>>int fortran)
>>
>> Return a contiguous chunk of memory representing the buffer.  If a
>> copy is made then return 1.  If no copy was needed return 0.
>
> 8) If a copy was made, What should consumers call to free memory?

You are right.  We need a free function.

> 9) What about using a char, like 'c' or 'C', and 'f' or 'F', and 0 or
> 'a' or 'A' (any) ?

I'm happy with that too. 
>
>> int PyObject_CopyToObject(PyObject *obj, void *buf, Py_ssize_t len,
>>   int fortran)
>
> 10) Better name? Perhaps PyObject_CopyBuffer or PyObject_CopyMemory?
I'm not sure why those are better names.  The current name reflects the 
idea of copying the data into the object.

>
>> int PyObject_SizeFromFormat(char *)
>>
>> int PyObject_IsContiguous(struct bufferinfo *view, int fortran);
>>
>> void PyObject_FillContiguousStrides(int *ndims, Py_ssize_t *shape,
>> int itemsize,
>> Py_ssize_t *strides, int 
>> fortran)
>>
>> int PyObject_FillBufferInfo(struct bufferinfo *view, void *buf, 
>> Py_ssize_t len,
>>  int readonly, int infoflags)
>>
>
> 11) Perhaps the 'PyObject_' prefix is wrong, as those functions does
> not operate with Python objects.

Agreed.

-Travis

___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-12 Thread Travis Oliphant
Carl Banks wrote:
>
> The thing that bothers me about this whole flags setup is that 
> different flags can do opposite things.
>
> Some of the flags RESTRICT the kind of buffers that can be
> exported (Py_BUF_WRITABLE); other flags EXPAND the kind of buffers that
> can be exported (Py_BUF_INDIRECT).  That is highly confusing and I'm -1
> on any proposal that includes both behaviors.  (Mutually exclusive sets
> of flags are a minor exception: they can be thought of as either
> RESTICTING or EXPANDING, so they could be mixed with either.)
The mutually exclusive set is the one example of the restriction that 
you gave. 

I think the flags setup I've described is much closer to your Venn 
diagram concept than you give it credit for.   I've re-worded some of 
the discussion (see 
http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/numpy/doc/pep_buffer.txt
 
) so that it is more clear that each flag is a description what kind of 
buffer the consumer is prepared to deal with.

For example, if the consumer cares about what's 'in' the array, it uses 
Py_BUF_FORMAT.   Exporters are free to do what they want with this 
information.   I agree that NumPy would not force you to use it's buffer 
only as a region of some specific type, but some other object may want 
to be much more restrictive and only export to consumers who will 
recognize the data stored for what it is.I think it's up to the 
exporters to decide whether or not to raise an error when a certain kind 
of buffer is requested.

Basically, every flag corresponds to a different property of the buffer 
that the consumer is requesting:

Py_BUF_SIMPLE  --- you are requesting the simplest possible  (0x00)

Py_BUF_WRITEABLE --  get a writeable buffer   (0x01)

Py_BUF_READONLY --  get a read-only buffer(0x02)

Py_BUF_FORMAT --  get a "formatted" buffer.   (0x04)

Py_BUF_SHAPE -- get a buffer with shape information  (0x08)

Py_BUF_STRIDES --  get a buffer with stride information (and shape)  (0x18)

Py_BUF_OFFSET -- get a buffer with suboffsets (and strides and shape) (0x38)

This is a logical sequence.  There is progression.  Each flag is a bit 
that indicates something about how the consumer can use the buffer.  In 
other words, the consumer states what kind of buffer is being 
requested.  The exporter obliges (and can save possibly significant time 
if the consumer is not requesting the information it must otherwise 
produce).

> I originally suggested a small set of flags that expand the set of 
> allowed buffers.  Here's a little Venn diagram of buffers to 
> illustrate what I was thinking:
>
> http://www.aerojockey.com/temp/venn.png
>
> With no flags, the only buffers allowed to be returned are in the "All"
> circle but no others.  Add Py_BUF_WRITABLE and now you can export
> writable buffers as well.  Add Py_BUF_STRIDED and the strided circle is
> opened to you, and so on.
>
> My recommendation is, any flag should turn on some circle in the Venn
> diagram (it could be a circle I didn't draw--shaped arrays, for
> example--but it should be *some* circle).
I don't think your Venn diagram is broad enough and it un-necessarily 
limits the use of flags to communicate between consumer and exporter.   
We don't have to ram these flags down that point-of-view for them to be 
productive.If you have a specific alternative proposal, or specific 
criticisms, then I'm very willing to hear them.   

I've thought through the flags again, and I'm not sure how I would 
change them.  They make sense to me.   Especially in light of past 
usages of the buffer protocol (where most people request read-or-write 
buffers i.e. Py_BUF_SIMPLE.   I'm also not sure our mental diagrams are 
both oriented the same.  For me, the most restrictive requests are

PY_BUF_WRITEABLE | Py_BUF_FORMAT and Py_BUF_READONLY | Py_BUF_FORMAT

The most un-restrictive request (the largest circle in my mental Venn 
diagram) is

Py_BUF_OFFSETS followed by Py_BUF_STRIDES followed by Py_BUF_SHAPE

adding Py_BUF_FORMATS, Py_BUF_WRITEABLE, or Py_BUF_READONLY serves to 
restrict any of the other circles

Is this dual use of flags what bothers you?  (i.e. use of some flags for 
restricting circles in your Venn diagram that are turned on by other 
flags? --- you say Py_BUF_OFFSETS | Py_BUF_WRITEABLE to get the 
intersection of the Py_BUF_OFFSETS largest circle with the WRITEABLE 
subset?) 

Such concerns are not convincing to me.  Just don't think of the flags 
in that way.  Think of them as turning "on" members of the bufferinfo 
structure.  

>
>
 Py_BUF_FORMAT
The consumer will be using the format string information so make 
 sure thatmember is filled correctly. 
>>>
>>> Is the idea to throw an exception if there's some other data format 
>>> besides "b", and this flag isn't set?  It seems superfluous otherwise.
>>
>> The idea is that a consumer may not care about the format and the 
>> exporter may want to know that to simplify the interface.In other 
>> words the flag 

Re: [Python-Dev] Extended Buffer Protocol - simple use examples

2007-04-13 Thread Travis Oliphant
Paul Moore wrote:
> On 09/04/07, Travis Oliphant <[EMAIL PROTECTED]> wrote:
> 
>>>I have skimmed (briefly, I'll admit!) the pre-PEP, but I've found it
>>>extremely difficult to find a simple example of the basic (in my view)
>>>use case of an undifferentiated block of bytes.
>>>
>>
>>This is a great suggestion and it was on my to-do list.  I've included
>>some examples of this use-case in the new PEP.
> 
> 
> Nice - those look clear to me. One question - if a producer is
> generating a more complex data format (for example, the RGBA example
> in the PEP) then would the "simple consumer" code (Ex. 3) still get a
> pointer (or would the RGBA code need to go through extra effort to
> allow this)? Sorry, again this is probably clear from reading the PEP
> details, but my eyes glaze over when I read about strides, shapes,
> etc...

Unless, the exporter took some measures to create a contiguous copy of 
its buffer in situations like that, the exporter would have to raise an 
error if a consumer asked for a simple contiguous buffer.


> 
> My motivation here is that it would be a shame if "old-style" code
> that was prepared to guess the format of a memory block stopped
> working when the producer of the memory added shape information (that
> the consumer didn't care about, except to validate its guess).

No, it wouldn't stop working, because right now, an error would get 
raised anyway.

For example, in NumPy, you can get an "old-style" buffer from an ndarray 
only as long as it is contiguous.  If it is discontiguous you will get 
an error.

Example:

 >>> a = numpy.array([[1,2,3],[4,5,6]])
 >>> b = a[::2,::2]
 >>> buffer(a)[:]
 >>> buffer(b)[:]

# the last statement will raise an error in current Python.

Part of the intent of the extended buffer protocol is to allow you to 
share even discontiguous memory with those that know how to handle it.

In addition, the two C-API calls that allow copying data to and from a 
contiguous buffer is to create a standard way to work with "any" object 
for routines that only know how to deal with contiguous memory.



-Travis


___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-13 Thread Travis Oliphant
Greg Ewing wrote:
> Travis Oliphant wrote:
>
>> It is more convenient to store any slicing information (so a memory 
>> view object could store an arbitrary slice of another object) as 
>> offsets, lengths, and skips which can be used to adjust the memory 
>> buffer returned by base.
>
> What happens if the base object changes its memory
> layout in such a way that the stored offsets, lengths
> and skips are no longer correct for the slice that
> was requested?

When the memory view object gets the buffer info again from the base 
object, it will be able to figure this out and raise an error.

-Travis

___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-04-15 Thread Travis Oliphant
Greg Ewing wrote:

> But since the NumPy object has to know about the provider,
> it can simply pass the release call on to it if appropriate.
> I don't see how this case necessitates making the release call
> on a different object.
>
> I'm -1 on involving any other objects or returning object
> references from the buffer interface, unless someone can
> come up with a use case which actually *requires* this
> (as opposed to it just being something which might be
> "nice to have"). The buffer interface should be Blazingly
> Fast(tm), and messing with PyObject*s is not the way to
> get that.

The current proposal would be fast but would be more flexible for 
objects that don't have a memory representation that can be shared 
unless they create their own "sharing object" that perhaps copies the 
data into a contiguous chunk first.   Objects which have memory which 
can be shared perfectly through the interface would simply pass 
themselves as the return value (after incrementing their "extant 
buffers" count by one).  

>
> Seems to me the lock should apply to *everything* returned
> by getbuffer. If the requestor is going to iterate over the
> data, and there are multiple dimensions, surely it's going to
> want to refer to the shape and stride info at regular intervals
> while it's doing that. Requiring it to make its own copy
> would be a burden.


There are two use cases that seem to be under discussion.

1) When you want to apply an algorithm to an arbitrary object that 
exposes the buffer interface

2) When you want to create an object that shares memory with another 
object exposing the buffer interface.

These two use cases have slightly different needs.  What I want to avoid 
is forcing the exporting object to be unable to change its shape and 
strides just because an object is using the memory for use case #2. 

I think the solution that states the shape and strides information are 
only guaranteed valid until the GIL is released is sufficent.  

Alternatively, one could release the shape and strides and format 
separately from the memory with a flag as a second argument to 
releasebuffer.

-Travis






>
> -- 
> Greg
>
>

___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-04-15 Thread Travis Oliphant
Carl Banks wrote:

> Tr
> ITSM that we are using the word "view" very differently.  Consider 
> this example:
>
> A = zeros((100,100))
> B = A.transpose()


You are thinking of NumPy's particular use case.  I'm thinking of a 
generic use case.  So, yes I'm using the word view in two different 
contexts.

In this scenario, NumPy does not even use the buffer interface.  It 
knows how to transpose it's own objects and does so by creating a new 
NumPy object (with it's own shape and strides space) with a data buffer 
pointed to by "A".

Yes, I use the word "view" for this NumPy usage, but only in the context 
of NumPy.   In the PEP, I've been using the word "view" quite a bit more 
generically.

So, I don't think this is a good example because A.transpose() will 
never call getbuffer of the A object (it will instead use the known 
structure of NumPy directly).  So, let's talk about the generic 
situation instead of the NumPy specific one.

>
> I'd suggest the object returned by A.getbuffer should be called the 
> "buffer provider" or something like that.

I don't care what we call it.  I've been using the word "view" because 
of the obvious analogy to my use of view in NumPy.  When I had 
envisioned returning an actual object very similar to a NumPy array from 
the buffer interface it made a lot of sense to call it a view.  Now, I'm 
fine to call it "buffer provider"

>
> For the sake of discussion, I'm going to avoid the word "view" 
> altogether.  I'll call A the exporter, as before.  B I'll refer to as 
> the requestor.  The object returned by A.getbuffer is the provider.

Fine.  Let's use that terminology since it is new and not cluttered by 
other uses in other contexts.

> Having thought quite a bit about it, and having written several 
> abortive replies, I now understand it and see the importance of it.  
> getbuffer returns the object that you are to call releasebuffer on.  
> It may or may not be the same object as exporter.  Makes sense, is 
> easy to explain.

Yes, that's exactly all I had considered it to be.   Only now, I'm 
wondering if we need to explicitly release a lock on the shape, strides, 
and format information as well as the buffer location information.

>
> It's easy to see possible use cases for returning a different object.  
> A  hypothetical future incarnation of NumPy might shift the 
> responsibility of managing buffers from NumPy array object to a hidden 
> raw buffer object.  In this scenario, the NumPy object is the 
> exporter, but the raw buffer object the provider.
>
> Considering this use case, it's clear that getbuffer should return the 
> shape and stride data independently of the provider.  The raw buffer 
> object wouldn't have that information; all it does is store a pointer 
> and keep a reference count.  Shape and stride is defined by the exporter.


So, who manages the memory to the shape and strides and isptr arrays?   
When a provider is created do these need to be created so that the shape 
and strides arrays are never deallocated when in use. 

The situation I'm considering is  if you have a NumPy array of shape 
(2,3,3) which you then obtain a provider of  (presumably from another 
package) and it retains a lock on the memory for a while.  Should it 
also retain a lock on the shape and strides array?   Can the NumPy array 
re-assign the shape and strides while the provider has still not been 
released?

I would like to say yes, which means that the provider must supply it's 
own copy of shape and strides arrays.  This could be the policy.  
Namely, that the provider must supply the memory for the shape, strides, 
and format arrays which is guaranteed for as long as a lock is held.  In 
the case of NumPy, that provider could create it's own copy of the shape 
and strides arrays (or do it when the shape and strides arrays are 
re-assigned).

>
>>> Second question: what happens if a view wants to re-export the 
>>> buffer? Do the views of the buffer ever change?  Example, say you 
>>> create a transposed view of a Numpy array.  Now you want a slice of 
>>> the transposed array.  What does the transposed view's getbuffer 
>>> export?
>>
>>
>> Basically, you could not alter the internal representation of the 
>> object while views which depended on those values were around.
>>
>> In NumPy, a transposed array actually creates a new NumPy object that 
>> refers to the same data but has its own shape and strides arrays.
>>
>> With the new buffer protocol, the NumPy array would not be able to 
>> alter it's shape/strides/or reallocate its data areas while views 
>> were being held by other objects.
>
>
> But requestors could alter their own copies of the data, no?  Back to 
> the transpose example: B itself obviously can't use the same "strides" 
> array as A uses.  It would have to create its own strides, right?


I don't like this example because B does have it's own strides because 
it is a complete NumPy array.   I think we are talking about the same 
thing and tha

[Python-Dev] Extended buffer PEP

2007-04-15 Thread Travis Oliphant


Here is my "final" draft of the extended buffer interface PEP. 

For those who have been following the discussion, I eliminated the 
releaser object and the lock-buffer function.   I decided that there is 
enough to explain with the new striding and sub-offsets without the 
added confusion of releasing buffers, especially when it is not clear 
what is to be gained by such complexity except a few saved lines of code.


The striding and sub-offsets, however, allow extension module writers to 
write code (say video and image processing code or scientific computing 
code or data-base processing code) that works on any object exposing the 
buffer interface.  I think this will be of great benefit and so is worth 
the complexity.


This will take some work to get implemented for Python 3k.  I could use 
some help with this in order to speed up the process.  I'm working right 
now on the extensions to the struct module until the rest is approved.


Thank you for any and all comments:

-Travis


:PEP: XXX
:Title: Revising the buffer protocol
:Version: $Revision: $
:Last-Modified: $Date:  $
:Authors: Travis Oliphant <[EMAIL PROTECTED]>, Carl Banks <[EMAIL PROTECTED]>
:Status: Draft
:Type: Standards Track
:Content-Type: text/x-rst
:Created: 28-Aug-2006
:Python-Version: 3000

Abstract


This PEP proposes re-designing the buffer interface (PyBufferProcs
function pointers) to improve the way Python allows memory sharing
in Python 3.0

In particular, it is proposed that the character buffer portion 
of the API be elminated and the multiple-segment portion be 
re-designed in conjunction with allowing for strided memory
to be shared.   In addition, the new buffer interface will 
allow the sharing of any multi-dimensional nature of the
memory and what data-format the memory contains. 

This interface will allow any extension module to either 
create objects that share memory or create algorithms that
use and manipulate raw memory from arbitrary objects that 
export the interface. 


Rationale
=

The Python 2.X buffer protocol allows different Python types to
exchange a pointer to a sequence of internal buffers.  This
functionality is *extremely* useful for sharing large segments of
memory between different high-level objects, but it is too limited and
has issues:

1. There is the little used "sequence-of-segments" option
   (bf_getsegcount) that is not well motivated. 

2. There is the apparently redundant character-buffer option
   (bf_getcharbuffer)

3. There is no way for a consumer to tell the buffer-API-exporting
   object it is "finished" with its view of the memory and
   therefore no way for the exporting object to be sure that it is
   safe to reallocate the pointer to the memory that it owns (for
   example, the array object reallocating its memory after sharing
   it with the buffer object which held the original pointer led
   to the infamous buffer-object problem).

4. Memory is just a pointer with a length. There is no way to
   describe what is "in" the memory (float, int, C-structure, etc.)

5. There is no shape information provided for the memory.  But,
   several array-like Python types could make use of a standard
   way to describe the shape-interpretation of the memory
   (wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video
   Libraries, ctypes, NumPy, data-base interfaces, etc.)

6. There is no way to share discontiguous memory (except through
   the sequence of segments notion).  

   There are two widely used libraries that use the concept of
   discontiguous memory: PIL and NumPy.  Their view of discontiguous
   arrays is different, though.  The proposed buffer interface allows
   sharing of either memory model.  Exporters will use only one
   approach and consumers may choose to support discontiguous 
   arrays of each type however they choose. 

   NumPy uses the notion of constant striding in each dimension as its
   basic concept of an array. With this concept, a simple sub-region
   of a larger array can be described without copying the data.   T
   Thus, stride information is the additional information that must be
   shared. 

   The PIL uses a more opaque memory representation. Sometimes an
   image is contained in a contiguous segment of memory, but sometimes
   it is contained in an array of pointers to the contiguous segments
   (usually lines) of the image.  The PIL is where the idea of multiple
   buffer segments in the original buffer interface came from.   

   NumPy's strided memory model is used more often in computational
   libraries and because it is so simple it makes sense to support
   memory sharing using this model.  The PIL memory model is sometimes 
   used in C-code where a 2-d array can be then accessed using double
   pointer indirection:  e.g. image[i][j].  

   The buffer interface should allow the object to export either of these
   memory models.  Consumers are free to either re

Re: [Python-Dev] Extended buffer PEP

2007-04-15 Thread Travis Oliphant
Greg Ewing wrote:

> Travis Oliphant wrote:
>
>> Carl Banks wrote:
>> > I'd like to see it accept a flags argument over what kind of buffer 
>> > it's allowed to return.  I'd rather not burden the user to check 
>> all > the entries in bufferinfo to make sure it doesn't get something 
>> > unexpected.
>> Yes, I agree. We had something like that at one point.
>
>
> Maybe this could be handled in an intermediate layer
> between the user and implementor of the interface,
> i.e. the user calls
>
>   PyBuffer_GetBuffer(obj, &info, flags);
>
> the object's tp_getbufferinfo just gets called as
>
>   getbufferinfo(self, &info)
>
> and PyBuffer_GetBuffer then checks that the result
> conforms to the requested feature set. This would
> relieve users of the interface from having to check
> that themselves, while not requiring implementors
> to be burdened with it either.

I like this strategy.Then, any intermediate buffering (that prompted 
the killed release-buffer object in the protocol) could be handled in 
this layer as well.

I also like the idea of passing something to the getbuffer call so that 
exporters can do less work if some things are not being requested, but 
that the exporter should be free to ignore the flag and always produce 
everything.

-Travis

___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-18 Thread Travis Oliphant
Greg Ewing wrote:
> Carl Banks wrote:
>
>> Py_BUF_REQUIRE_READONLY - Raise excpetion if the buffer is writable.
>
> Is there a use case for this?

Yes.  The idea is used in NumPy all the time.

Suppose you want to write to an array but only have an algorithm that 
works with contiguous data.  Then you need to make a copy of the data 
into a contiguous buffer but you would like to make the original memory 
read-only until you are done with the algorithm and have copied the data 
back into memory.

That way when you release the GIL, the memory area will now be read-only 
and so other instances won't write to it (because any writes will be 
eradicated by the copy back when the algorithm is done).

NumPy uses this idea all the time in its UPDATE_IF_COPY flag.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-18 Thread Travis Oliphant
Carl Banks wrote:
> Ok, I've thought quite a bit about this, and I have an idea that I 
> think will be ok with you, and I'll be able to drop my main 
> objection.  It's not a big change, either.  The key is to explicitly 
> say whether the flag allows or requires.  But I made a few other 
> changes as well.
I'm good with using an identifier to differentiate between an "allowed" 
flag and a "require" flag.   I'm not a big fan of 
VERY_LONG_IDENTIFIER_NAMES though.  Just enough to understand what it 
means but not so much that it takes forever to type and uses up 
horizontal real-estate.

We use flags in NumPy quite a bit, and I'm obviously trying to adapt 
some of this to the general case here, but I'm biased by my 10 years of 
experience with the way I think about NumPy arrays.

Thanks for helping out and offering your fresh approach.   I like a lot 
of what you've come up with.  There are a few modifications I would 
make, though.

>
> First of all, let me define how I'm using the word "contiguous": it's 
> a single buffer with no gaps.  So, if you were to do this: 
> "memset(bufinfo->buf,0,bufinfo->len)", you would not touch any data 
> that isn't being exported.

Sure, we call this NPY_ONESEGMENT in NumPy-speak, though, because 
contiguous could be NPY_C_CONTIGUOUS or NPY_F_CONTIGUOUS.   We also 
don't use the terms ROW_MAJOR and COLUMN_MAJOR and so I'm not a big fan 
of bringing them up in the Python space because the NumPy community has 
already learned the C_ and F_ terminology which also generalizes to 
multiple-dimensions more clearly without using 2-d concepts.
>
> Without further ado, here is my proposal:
>
>
> --
>
> With no flags, the PyObject_GetBuffer will raise an exception if the 
> buffer is not direct, contiguous, and one-dimensional.  Here are the 
> flags and how they affect that:

I'm not sure what you mean by "direct" here.  But, this looks like the 
Py_BUF_SIMPLE case (which was a named-constant for 0) in my proposal.
The exporter receiving no flags would need to return a simple buffer 
(and it wouldn't need to fill in the format character either --- 
valuable information for the exporter to know).
>
> Py_BUF_REQUIRE_WRITABLE - Raise exception if the buffer isn't writable.
WRITEABLE is an alternative spelling and the one that NumPy uses.   So, 
either include both of these as alternatives or just use WRITEABLE.
>
> Py_BUF_REQUIRE_READONLY - Raise excpetion if the buffer is writable.
Or if the object memory can't be made read-only if it is writeable.
>
> Py_BUF_ALLOW_NONCONTIGUOUS - Allow noncontiguous buffers.  (This turns 
> on "shape" and "strides".)
>
Fine.
> Py_BUF_ALLOW_MULTIDIMENSIONAL - Allow multidimensional buffers.  (Also 
> turns on "shape" and "strides".)
Just use ND instead of MULTIDIMENSIONAL   and only turn on shape if it 
is present.
>
> (Neither of the above two flags implies the other.)
>

> Py_BUF_ALLOW_INDIRECT - Allow indirect buffers.  Implies 
> Py_BUF_ALLOW_NONCONTIGUOUS and Py_BUF_ALLOW_MULTIDIMENSIONAL. (Turns 
> on "shape", "strides", and "suboffsets".)
If we go with this consumer-oriented naming scheme, I like indirect also.
>
> Py_BUF_REQUIRE_CONTIGUOUS_C_ARRAY or Py_BUF_REQUIRE_ROW_MAJOR - Raise 
> an exception if the array isn't a contiguous array with in C 
> (row-major) format.
>
> Py_BUF_REQUIRE_CONTIGUOUS_FORTRAN_ARRAY or Py_BUF_REQUIRE_COLUMN_MAJOR 
> - Raise an exception if the array isn't a contiguous array with in 
> Fortran (column-major) format.
Just name them C_CONTIGUOUS and F_CONTIGUOUS like in NumPy.
>
> Py_BUF_ALLOW_NONCONTIGUOUS, Py_BUF_REQUIRE_CONTIGUOUS_C_ARRAY, and 
> Py_BUF_REQUIRE_CONTIGUOUS_FORTRAN_ARRAY all conflict with each other, 
> and an exception should be raised if more than one are set.
>
> (I would go with ROW_MAJOR and COLUMN_MAJOR: even though the terms 
> only make sense for 2D arrays, I believe the terms are commonly 
> generalized to other dimensions.)
As I mentioned there is already a well-established history with NumPy.  
We've dealt with this issue already.
>
> Possible pseudo-flags:
>
> Py_BUF_SIMPLE = 0;
> Py_BUF_ALLOW_STRIDED = Py_BUF_ALLOW_NONCONTIGUOUS
>| Py_BUF_ALLOW_MULTIDIMENSIONAL;
>
> --
>
> Now, for each flag, there should be an associated function to test the 
> condition, given a bufferinfo struct.  (Though I suppose they don't 
> necessarily have to map one-to-one, I'll do that here.)
>
> int PyBufferInfo_IsReadonly(struct bufferinfo*);
> int PyBufferInfo_IsWritable(struct bufferinfo*);
> int PyBufferInfo_IsContiguous(struct bufferinfo*);
> int PyBufferInfo_IsMultidimensional(struct bufferinfo*);
> int PyBufferInfo_IsIndirect(struct bufferinfo*);
> int PyBufferInfo_IsRowMajor(struct bufferinfo*);
> int PyBufferInfo_IsColumnMajor(struct bufferinfo*);
>
> The function PyObject_GetBuffer then has a pretty obvious 
> implementation.  Here is an except:
>
> if ((flags & Py_BUF_REQUIRE_READONLY) &&
> !PyBufferInfo_IsReadonly(&bufinfo)) {
> PyExc_

Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-19 Thread Travis Oliphant
Carl Banks wrote:
> Travis Oliphant wrote:
>> Carl Banks wrote:
>>> Ok, I've thought quite a bit about this, and I have an idea that I 
>>> think will be ok with you, and I'll be able to drop my main 
>>> objection.  It's not a big change, either.  The key is to explicitly 
>>> say whether the flag allows or requires.  But I made a few other 
>>> changes as well.
>> I'm good with using an identifier to differentiate between an 
>> "allowed" flag and a "require" flag.   I'm not a big fan of 
>> VERY_LONG_IDENTIFIER_NAMES though.  Just enough to understand what it 
>> means but not so much that it takes forever to type and uses up 
>> horizontal real-estate.
>
> That's fine with me.  I'm not very particular about spellings, as long 
> as they're not misleading.
>
>>> Now, here is a key point: for these functions to work (indeed, for 
>>> PyObject_GetBuffer to work at all), you need enough information in 
>>> bufinfo to figure it out.  The bufferinfo struct should be 
>>> self-contained; you should not need to know what flags were passed 
>>> to PyObject_GetBuffer in order to know exactly what data you're 
>>> looking at.
>> Naturally.
>>
>>> Therefore, format must always be supplied by getbuffer.  You cannot 
>>> tell if an array is contiguous without the format string.  (But see 
>>> below.)
>>
>> No, I don't think this is quite true.   You don't need to know what 
>> "kind" of data you are looking at if you don't get strides.  If you 
>> use the SIMPLE interface, then both consumer and exporter know the 
>> object is looking at "bytes" which always has an itemsize of 1.
>
> But doesn't this violate the above maxim?  Suppose these are the 
> contents of bufinfo:
>
> ndim = 1
> len = 20
> shape = (10,)
> strides = (2,)
> format = NULL

In my thinking, format/itemsize is necessary if you have strides (as you 
do here) but not needed if you don't have strides information (i.e. you 
are assuming a C_CONTIGUOUS memory-chunk).   The intent of the simple 
interface is to basically allow consumers to mimic the old buffer 
protocol, very easily. 
>
> How does it know whether it's looking at contiguous array of 10 
> two-byte objects, or a discontiguous array of 10 one-byte objects, 
> without having at least an item size?  Since item size is now in the 
> mix, it's moot, of course.

My only real concern is to have some way to tell the exporter that it 
doesn't need to "figure out" the format if the consumer doesn't care 
about it.  Given the open-ended nature of the format string, it is 
possible that a costly format-string construction step could be 
under-taken even when the consumer doesn't care about it.

I can see you are considering the buffer structure as a 
self-introspecting structure where I was considering it only in terms of 
how the consumer would be using its members (which implied it knew what 
it was asking for and wouldn't touch anything else).

How about we assume FORMAT will always be filled in but we add a 
Py_BUF_REQUIRE_PRIMITIVE flag that will only return "primitive" format 
strings (i.e. basic c-types)?   An exporter receiving this flag will 
have to return complicated data-types as 'bytes'.   I would add this to 
the Py_BUF_SIMPLE default.

>
> The idea that Py_BUF_SIMPLE implies bytes is news to me.  What if you 
> want a contiguous, one-dimensional array of an arbitrary type?  I was 
> thinking this would be acceptable with Py_BUF_SIMPLE.
Unsigned bytes are just the lowest common denominator.  They represent 
the old way of sharing memory.   Doesn't an "arbitrary type" mean 
bytes?  Or did you mean what if you wanted a contiguous, one-dimensional 
array of a *specific* type?

>   It seems you want to require Py_BUF_FORMAT for that, which would 
> suggest to me that
> But it now it seems even more unnecessary than it did before.  
> Wouldn't any consumer that just wants to look at a chunk of bytes 
> always use Py_BUF_FORMAT, especially if there's danger of a 
> presumptuous exporter raising an exception?
>
I'll put in the REQUIRE_PRIMITIVE_FORMAT idea in the next update to the 
PEP.  I can just check in my changes to SVN, so it should show up by 
Friday.

Thanks again,

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Google spreadsheet to collaborate on backporting Py3K stuff to 2.6

2007-09-06 Thread Travis Oliphant
Brett Cannon wrote:
> Neal, Anthony, Thomas W., and I have a spreadsheet that was started to
> keep track of what needs to be done in what needs to be done in 2.6
> for Py3K transitioning:
> http://spreadsheets.google.com/pub?key=pCKY4oaXnT81FrGo3ShGHGg .  I am
> opening the spreadsheet up to everyone so that others can help
> maintain it.
> 
> There is a sheet in the Python 3000 Tasks spreadsheet that should be
> merged into this spreadsheet and then deleted.  If anyone wants to
> help with that it would be great (once something has been moved from
> "Python 3000 Tasks" to "Python 2 -> 3 transition" just delete it from
> "Python 3000 Tasks").
> 
> Because Neal created this spreadsheet he is the only one who can open
> editing to everyone.  If you would like to have edit abilities to the
> spreadsheet just reply to this email saying you want an invite and I
> will add you manually (and if you want a different address added just
> say so).

I would like an invite.

Thanks.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Patch for adding offset to mmap

2007-10-22 Thread Travis Oliphant
Hi all,

I think the latest patch for fixing Issue 708374 (adding offset to mmap)
should be committed to SVN.

I will do it, if nobody opposes the plan.  I think it is a very
important addition and greatly increases the capability of the mmap module.

Thanks,

-Travis Oliphant


P.S.  Initially sent this to the wrong group (I've been doing that a lot 
lately --- too many groups seen through gmane...).  Apologies for 
multiple postings.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Error in PEP3118?

2008-01-23 Thread Travis Oliphant
Thomas Heller wrote:
> Hi Travis,
> 
> The pep contains this sample:
> 
> """
> Nested array
> ::
> 
> struct {
>  int ival;
>  double data[16*4];
> }
> """i:ival: 
>(16,4)d:data:
> """
> """
> 
> I think it is wrong and must be changed to the following; is this correct?
> 
> """
> Nested array
> ::
> 
> struct {
>  int ival;
>  double data[16][4];
> }
> """i:ival: 
>(16,4)d:data:
> """
> """

I responded off list to this email and wanted to summarize my response 
for others to peruse.

Basically,  the answer is that the struct syntax proposed for 
multi-dimensional arrays is not intended to mimic how the C-compiler 
handles statically defined C-arrays (i.e. the pointer-to-pointers style 
of multi-dimensional arrays).  It is intended to handle the 
contiguous-block-of-data style of multi-dimensional arrays that NumPy uses.

I wanted to avoid 2-d static arrays in the examples because it gets 
confusing and AFAIK the layout of the memory for a double data[16][4] is 
the same as data[16*4].  The only difference is how the C-compiler 
translates data[4][3] and data[4].

The intent of the struct syntax is to handle describing memory.  The 
point is not to replicate how the C-compiler deals with statically 
defined N-D arrays.  Thus, even though the struct syntax allows 
*communicating* the intent of a contiguous block of memory inside a 
structure as an N-d array, the fundamental memory block is the 
equivalent of a 1-d array in C.

So, I think the example is correct (and intentional).

-Travis O.






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Error in PEP3118?

2008-02-11 Thread Travis Oliphant
Thomas Heller wrote:
> Travis Oliphant schrieb:
> 
>>
>> The intent of the struct syntax is to handle describing memory.  The 
>> point is not to replicate how the C-compiler deals with statically 
>> defined N-D arrays.  Thus, even though the struct syntax allows 
>> *communicating* the intent of a contiguous block of memory inside a 
>> structure as an N-d array, the fundamental memory block is the 
>> equivalent of a 1-d array in C.
>>
>> So, I think the example is correct (and intentional).
> 
> Sorry, I do not think so.  If you use a 2-d array in the example, you
> must describe it correctly.  The difference between this pep and the old
> buffer interface is that the pep allows to describe both how the compiler
> sees the memory block plus the size and layout of the memory block, while
> the old buffer interface only describes single-segment memory blocks.
> And 'double data[16][4]' *is* a single memory block containing a 2-d array,
> and *not* an array of pointers.
> 

I don't understand what you mean by "must describe it correctly".   The 
size and layout of the memory block description of the PEP is not 
supposed to be dependent on the C-compiler.  It should also be able to 
define memory as used in Fortran, C#, a file, or whatever.  So, I don't 
understand the insistence that the example use C-specific 2-d array syntax.

The example as indicated is correct.  It is true that the 2-d nature of 
the block of data is only known by Python in this example.  You could 
argue that it would be more informative by showing the C-equivalent 
structure as a 2-d array.  However, it would also create the possibility 
of confusion by implying an absolute relationship between the C-compiler 
and the type description.

Your insistence that the example is incorrect makes me wonder what point 
is not being communicated between us.  Clearly there is overlap between 
C structure syntax and the PEP syntax, but the PEP type syntax allows 
for describing data in ways that the C compiler doesn't.

I'd rather steer people away from statically defined arrays in C and 
don't want to continually explain how they are subtly different.

My perception is that you are seeing too much of a connection between 
the C-compiler and the PEP description of memory.   Perhaps that's not 
it, and I'm missing something else.

Best regards,


-Travis O.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Error in PEP3118?

2008-02-19 Thread Travis Oliphant
Lisandro Dalcin wrote:
> On 2/11/08, Travis Oliphant <[EMAIL PROTECTED]> wrote:
>> My perception is that you are seeing too much of a connection between
>> the C-compiler and the PEP description of memory.   Perhaps that's not
>> it, and I'm missing something else.
>>
> 
> Travis, all this make me believe that (perhaps) the 'format'
> specification in the new buffer interface is missing the 'C' or 'F'
> ordering in the case of a countiguos block. I'm missing something? Or
> should we always assume a 'C' ordering?

There is an ability to specify 'F' for the overall buffer.   In the 
description of each element, however, (i.e. in the struct-syntax), the 
multi-dimensional character is always communicated in 'C' order 
(last-dimension varies the fastest).

I thought about adding the ability to specify the multi-dimensional 
order as 'F' in the struct-syntax for each element, but felt against it 
as you can simulate 'F' order by thinking of the array in transpose 
fashion:  i.e.  your 3x5 Fortran-order array is really a 5x3 (C-order 
array).

Of course, the same is true on the larger scale when we are talking 
about multi-dimensional arrays of "elements," but on that level 
connecting with Fortran libraries is much more common and so we have 
found the help useful in NumPy.

-Travis O.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.6 and 3.0 tasks

2008-03-16 Thread Travis Oliphant
Guido van Rossum wrote:
> Moving this to a new subject to keep the discussion of tasks and the
> discussion of task tracking tools separate.
> 
> On Sun, Mar 16, 2008 at 9:42 AM, Christian Heimes <[EMAIL PROTECTED]> wrote:
>>  I did a quick brainstorming with me, myself and I. I came up with a list
>>  of (IMHO) important tasks.
>>
>>  * Stabilize the C API of Python 3.0. I like to rename several prefixes
>>  to reduce the confusing: PyBytes -> PyByteArray,
> 
> +1 (also +1 to backporting this to 2.6)
> 
>> PyString -> PyBytes ...
> 
> -1. This will make merging code from 2.6 harder, and causes more work
> for porting C extensions.
> 
>>  * Backport the new buffer protocol to 2.6. I spoke to Travis yesterday
>>  and he said he is trying to get it done during the PyCon sprint. Maybe
>>  somebody can assist him?
> 
> Does he need assistance?

I don't really need help with back-porting the protocol.   However, I do 
need help with the struct module changes.  This is a standard-library 
that I'm hoping to get help with.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 361: Python 2.6/3.0 release schedule

2008-03-18 Thread Travis Oliphant
Barry Warsaw wrote:
> Greetings from Pycon 2008!
> 
> Neal Norwitz and I have worked out the schedule for Python 2.6 and 3.0, 
> which will be released in lockstep.  We will be following a monthly 
> release schedule, with releases to occur on the first Wednesday of the 
> month.  We'll move to a 2 week schedule for the release candidates.
> 

Hey Barry,

Thanks for putting this PEP together.  This is really helpful.

I didn't see discussion of PEP 3118 and it's features back-ported to 
Python 2.6.  I've already back-ported the new buffer API as an addition 
to the old buffer protocol.

In addition, I've planned to back-port the improvements to the struct 
module and the addition of the memoryview object (both in PEP 3118).

If you have questions, we can talk tomorrow.

Best regards,

-Travis Oliphant

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-10 Thread Travis Oliphant

Antoine Pitrou wrote:

Hello,

The Py_buffer struct has two pointers named `shape` and `strides`. Each points
to an array of Py_ssize_t values whose length is equal to the number of
dimensions of the buffer object. Unfortunately, the buffer protocol spec doesn't
explain how allocation of these arrays should be handled.



I'm coming in late to this discussion, so I apologize for being out of 
order.   But, as Nick later clarifies, the PEP *does* specify how 
allocation of these arrays is handled.


Specifically, it is the responsibility of the exporter to do it and keep 
them correct as long as the buffer is shared.


I have not been able to keep up with the python-dev mailing lists since 
I have been working full time outside of academia.   I apologize for the 
difficulty this may have caused.  But, I have been available via email 
and am happy to respond to specific questions regarding the buffer 
protocol and its implementation.


I will make some time during December to help clean up confusing issues. 
 There are still pieces to implement as well (the enhancements to the 
struct module, for example), but I will not have time for this in the 
next 6 months because I would like to spend any time I can find on 
porting NumPy to use the new buffer protocol as part of getting NumPy 
ready for 3.0.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-10 Thread Travis Oliphant

Alexander Belopolsky wrote:

On Mon, Dec 8, 2008 at 6:25 PM, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
..

Alexander's suggestion of going and looking at what the numpy folks have
done in this area is probably a good idea too.

Well, I'm open to others doing this, but I won't do it myself. My interest is in
fixing the most glaring bugs of the buffer API and memoryview object. The numpy
folks are welcome to voice their opinions and give advice on python-dev.



I did not follow numpy development for the last year or more, so I
won't qualify as "the numpy folks," but my understanding is that numpy
does exactly what Nick recommended: the viewed object owns shape and
strides just as it owns the data.  The viewing object increases the
reference count of the viewed object and thus assures that data, shape
and strides don't go away prematurely.

I am copying Travis, the author of the PEP 3118, hoping that he would
step in on behalf of "the numpy folks."


I appreciate the copy, as I mentioned I have not had time to follow 
python-dev in detail this year, but I'm glad to help maintain the buffer 
protocol and share any information I can.


I think Nick understands the situation:  the exporter is responsible for 
 allocating and freeing shape, strides, and suboffsets memory (as well 
as  formats, and buf memory).   How it does this is not specified and 
open for interpretation by the objects.  In the standard library there 
is nothing that needs anything complicated and I'm comfortable with what 
I wrote previously to support the objects in the standard library.


There is a length bug in the memoryview implementation, but that is a 
separate issue and being handled.


NumPy will have to handle sharing shape and strides information and will 
serve as a reference implementation when that support is added.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-10 Thread Travis Oliphant

Antoine Pitrou wrote:

Alexander Belopolsky  gmail.com> writes:

I did not follow numpy development for the last year or more, so I
won't qualify as "the numpy folks," but my understanding is that numpy
does exactly what Nick recommended: the viewed object owns shape and
strides just as it owns the data.  The viewing object increases the
reference count of the viewed object and thus assures that data, shape
and strides don't go away prematurely.


That doesn't work if e.g. you take a slice of a memoryview object, since the
shape changes in the process.
See http://bugs.python.org/issue4580




I think there was some confusion about how to support slicing with 
memory view objects.  I remember thinking about it but not getting to 
the code to write it.   The memory object is both an exporter and 
consumer of the buffer protocol.  It can have it's own semantics about 
storing shape and strides information separate from the buffer protocol.


The memory view object needs some way to translate the information it 
gets from the underlying object to the consumer of the information.


My thinking is that the memory view object itself will allocate shape 
and strides information as it needs it.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-10 Thread Travis Oliphant

Antoine Pitrou wrote:

Nick Coghlan  gmail.com> writes:

For the slicing problem in particular, memoryview is currently trying to
get away with only one Py_buffer object when it needs TWO.


Why should it need two? Why couldn't the embedded Py_buffer fullfill all the
needs of the memoryview object? If the memoryview can't be a relatively thin
object-oriented wrapper around a Py_buffer, then this all screams failure to me.



The advice to look at NumPy is good because memoryview is modeled after 
NumPy -- and never completed.


When a slice view is made, a new memoryview object is created with a 
Py_buffer  structure that needs to allocate it's own shape and strides 
(or something that will allow correct shape and strides to be reported 
to any consumer).  In this way, there are two Py_buffer structures.


I do not remember implementing slicing for memoryview objects and it 
looks like the problem is there.






In all honesty, I admit I am annoyed by all the problems with the buffer API /
memoryview object, many of which are caused by its utterly bizarre design (and
the fact that the design team went missing in action after imposing such a
bizarre and complex design on us), and I'm reluctant to add yet another level of
byzantine complexity in order to solve those problems. It explains I may sound a
bit angry at times :-)


I understand your frustration, but I've been here (just not able to 
follow python-dev), and I've tried to respond to issues that came to my 
attention.   I did not have time to complete the memoryview 
implementation, but that does not meen the buffer API is "bizarre".


Yes, the cobbled together memoryview object itself may be "bizarre", but 
that is sometimes the reality of volunteer work.  Just ignore the 
memoryview object if it does not meet your needs.


Please let me know what other problems exist.



If we really need to change things a lot to make them work, we should re-work
the buffer API from the ground up, make the Py_buffer struct a true PyObject
(that is, a true variable-length object so as to solve the shape and strides
allocation issue) and merge it with the current memoryview implementation. It
would make things both more simpler and more flexible.



The only place there is a shape/strides allocation issue is with the 
memoryview object itself.   There is not an issue as far as I can see 
with the buffer protocol itself.


I'm glad you are trying to help clean up the memoryview implementation. 
I welcome the eyes and the keystrokes.  Are you familiar at all 
with NumPy?  That may help you understand what you currently consider to 
be "utterly bizarre"


Best regards,

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-10 Thread Travis Oliphant

Greg Ewing wrote:

Antoine Pitrou wrote:

Why should it need two? Why couldn't the embedded Py_buffer fullfill 
all the
needs of the memoryview object? 


Two things here:

  1) The memoryview should *not* be holding onto a Py_buffer
 in between calls to its getitem and setitem methods. It
 should request one from the underlying object when needed
 and release it again as soon as possible.



This is actually a different design than the PEP calls for.  From the PEP:

   This is functionally similar to the current buffer object except a
reference to base is kept and the memory view is not re-grabbed.
Thus, this memory view object holds on to the memory of base until it
is deleted.

I'm open to this changing, but it is the current PEP.



  2) The "second" Py_buffer referred to above only needs to
 be materialized when someone makes a GetBuffer request on
 the memoryview itself. It's not needed for Python getitem
 and setitem calls. (The implementation might choose to
 implement these by creating a temporary Py_buffer, but
 again, it would only last as long as the call.)


The memoryview object will need to store some information for 
re-calculating strides, shape, and sub-offsets for consumers.





If the memoryview can't be a relatively thin
object-oriented wrapper around a Py_buffer, then this all screams 
failure to me.


It shouldn't be a wrapper around a Py_buffer, it should be a
wrapper around the buffer *interface* of the underlying object.



This is a different object than what was proposed, but I'm not opposed 
to it.



It sounds to me like whoever wrote the memoryview implementation
didn't understand how the buffer interface is meant to be used.
That doesn't mean there's anything wrong with the buffer interface.

I have some doubts myself about whether it needs to be as
complicated as it is, but I think the basic idea is sound:
that Py_buffer objects are ephemeral, to be obtained when
needed and not kept for any longer than necessary.



I'm all for simplifying as much as possible.  There are some things I 
understand very well (like how strides and shape information can be 
shared with views), but others that I'm trying to understand better 
(like whether holding on to a view or re-grabbing the view is better).


I think I'm leaning toward the re-grabbing concept.   I'm all for 
improving the memoryview object, but let's not confuse that effort with 
the buffer API implementation.


I do not think we need to worry about changes to the memoryview object, 
because I doubt anything outside of the standard library is using it yet.



-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-10 Thread Travis Oliphant

Nick Coghlan wrote:

Antoine Pitrou wrote:

In all honesty, I admit I am annoyed by all the problems with the buffer API /
memoryview object, many of which are caused by its utterly bizarre design (and
the fact that the design team went missing in action after imposing such a
bizarre and complex design on us), and I'm reluctant to add yet another level of
byzantine complexity in order to solve those problems. It explains I may sound a
bit angry at times :-)

If we really need to change things a lot to make them work, we should re-work
the buffer API from the ground up, make the Py_buffer struct a true PyObject
(that is, a true variable-length object so as to solve the shape and strides
allocation issue) and merge it with the current memoryview implementation. It
would make things both more simpler and more flexible.


I don't see anything wrong with the PEP 3118 protocol. It does exactly
what it is designed to do: allow the number crunching crowd to share
large datasets between different libraries without copying things around
in memory. Yes, the protocol is complicated, but that is because it is
trying to handle a complicated problem.

The memoryview implementation on the other hand is pretty broken. I do
have a theory on how it ended up in such an unusable state, but I'm not
particularly inclined to share it - this kind of thing can happen
sometimes, and the important question now is how we fix it.



Thank you Nick.   This is a correct assessment of the situation.  I'd 
like to help improve memoryview as I can.  It does need thought about 
what you want memoryview to be.


I wanted memoryview to be able to be sliced and diced (much like NumPy 
arrays).  But, I only was able to get around to implementing the (simple 
view of Py_buffer struct).



-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-10 Thread Travis Oliphant

Antoine Pitrou wrote:

Nick Coghlan  gmail.com> writes:

I don't see anything wrong with the PEP 3118 protocol.


Apart from the fact that:
- it uses something (Py_buffer) which is not a PyObject and has totally
different allocation/lifetime semantics (which makes it non-trivial to adapt to
for anyone used to the rest of the C API)


 * this is a non-issue.   The Py_buffer struct is just a place-holder 
for a bunch of variables.  It could be a Python-object but that was seen 
as unnecessary.


- it has unsolved issues like allocation of the underlying shape and strides 
members


 * this is false.  It does specify how this is handled.


- it doesn't specify how to obtain e.g. a sub-buffer, or even duplicate an
existing one (which seem to be rather fundamental actions to me)


 * this is not part of the PEP.  Whether it's a deficiency or not is 
open to interpretation.




... I agree there's nothing wrong with it!


I'm glad you agree.




That Py_buffer describes the *whole* data store, but a memoryview slice
may only be exposing part of it - so while the info in the Py_buffer is
accurate for the underlying object, it is *not* accurate for the
memoryview itself.


And the problem here is that Py_buffer is/was (*) not flexible enough to allow
easy modification in order to take a sub-buffer without some annoying problems.



You are confusing the intent of the memoryview with the Py_buffer struct.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-10 Thread Travis Oliphant

Greg Ewing wrote:

Nick Coghlan wrote:

Maintaining a PyDict instance to map from view pointers to shapes
and strides info doesn't strike me as a "complex scheme" though.


I don't see why a given buffer provider should ever need
more than one set of shape/strides arrays at a time. It
can allocate them on creation, reallocate them as needed
if the shape of its internal data changes, and deallocate
them when it goes away.



I agree.  NumPy has a single shape/strides array.  The intent was to 
share this through the buffer interface.




If you are creating view objects that present slices or
some other alternative perspective, then the view object
itself is a buffer provider and should maintain shape/stride
arrays for its particular view of the underlying object.


Yes, that is correct.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Clarification sought about including a multidimensional array object into Python core

2005-02-09 Thread Travis Oliphant
There has recently been some much-needed discussion on the 
numpy-discussions list run by sourceforge regarding the state of the 
multidimensional array objects available for Python.  It is desired by 
many that there be a single multidimensional array object in the Python 
core to facilitate data transfer and interfacing between multiple packages.

I am a co-author of the current PEP regarding inclusion of the 
multidimensional array object into the core.  However, that PEP is 
sorely outdated.  Currently there are two multidimensional array objects 
that are in use in the Python community:

  Numeric --- original arrayobject created by Jim Hugunin and many 
others.  Has been developed and used for 10 years.  An upgrade that adds 
the features of numarray but maintains the same basic structure of 
Numeric called Numeric3 is in development and will be ready for more 
wide-spread use in a couple of weeks.

  Numarray --- in development for about 3 years.  It was billed by some 
as a replacement for Numeric,.  While introducing some new features, it 
still has not covered the full feature set that Numeric had making it 
impossible for all Numeric users to use it.  In addition, it is still 
unacceptably slow for many operations that Numeric does well. 

Scientific users will always have to install more packages in order to 
use Python for their purposes.  However, there is still the desire that 
the basic array object would be common among all Python users.   To 
assist in writing a new PEP, we need clarification from Guido and others 
involved regarding

1) What specifically about Numeric prevented it from being acceptable as 
an addition to the Python core.
2) Are there any fixed requirements (other than coding style) before an 
arrayobject would be accepted into the Python core.

Thanks for your comments.  I think they will help the discussion 
currently taking place.

-Travis Oliphant

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Clarification sought about including a multidimensional array object into Python core

2005-02-09 Thread Travis Oliphant
Martin v. Löwis wrote:
Travis Oliphant wrote:
I am a co-author of the current PEP regarding inclusion of the 
multidimensional array object into the core.  However, that PEP is 
sorely outdated.
[...]
1) What specifically about Numeric prevented it from being acceptable 
as an addition to the Python core.
2) Are there any fixed requirements (other than coding style) before 
an arrayobject would be accepted into the Python core.

I think you answered these questions yourself. If a PEP is sorely
outdated after only 3 years of its life, there clearly is something
wrong with the PEP. 
Exactly, the PEP does not reflect the reality of what anybody wants in 
the core.  It needs modification, or replacment.   Can I just do that?  
Or do I need permission from Barrett and others who has only a passing 
interest in this anymore.

Python language features will have to live
10 years or so before they can be considered outdated, and then
another 20 years before they can be removed (look at string
exceptions as an example).
I think you misunderstood my meaning.  For example Numeric has lived 10 
years with very few changes.  It seems to me it is rather stable.

So if it is still not clear what kind of API would be adequate
after all these years, it is best (IMO) to wait a few more years
for somebody to show up with a good solution to the problem
(which I admit I don't understand).
It actually is pretty clear to many.  There have been a wide variety of 
modules written on top of Numeric and Numarray. Most of the rough 
spots around the edges have been ironed out.   Our arguments now are 
about packaging other code living on top of an arrayobject.

Thanks for your help,
-Travis
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Clarification sought about including a multidimensional array object into Python core

2005-02-09 Thread Travis Oliphant
David Ascher wrote:
I've not followed the num* discussion in quite a while, but my
impression back then was that there wasn't "one" such community. 
Instead, the technical differences in the approaches required in
specific fields, regarding things like the relative importance of
memory profiles, speed, error handling, willingness to require modern
C++ compilers, etc. made practical compromises quite tricky.
 

I really appreciate comments from those who remember some of the old 
discussions.

There are indeed some different needs.  Most of this, however, is in the 
ufunc object (how do you do math with the arrays).   And, a lot of this 
has been ameliorated with the new concepts of error modes that numarray 
introduced.

There is less argumentation over the basic array object as a memory 
structure.   The biggest argument right now is the design of the object: 
i.e.  a mixture of Python and C (numarray) versus a C-only object 
(Numeric3).

In other words, what I'm saying is that in terms of how the array object 
should be structure, a lot is known.  What is more controversial is 
should the design be built upon Numarray's object structure (a mixture 
of Python and C), or on Numeric's --- all in C

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: Numeric life as I see it

2005-02-09 Thread Travis Oliphant

Martin v. Löwis wrote:
The PEP should list the options, include criteria
for selection, and then propose a choice. People can then discuss
whether the list of options is complete (if not, you need to extend
it), whether the criteria are agreed (they might be not, and there
might be difficult consensus, which the PEP should point out), and
whether the choice is the right one given the criteria (there should
be no debate about this - everybody should agree factually that the
choice meets the criteria best).
Unrealistic. I think it is undisputed that there are people with 
irreconcilably different needs. Frankly, we spent many, many months on 
the design of Numeric and it represents a set of compromises already. 
However, the one thing it wouldn't compromise on was speed, even at 
the expense of safety. A community exists that cannot live with this 
compromise. We were told that the Python core could also not live with 
that compromise.

I'm not sure I agree.  The ufuncobject is the only place where this 
concern existed (should we trip OverFlow, ZeroDivision, etc. errors 
durring array math).   Numarray introduced and implemented the concept 
of error modes that can be pushed and popped.  I believe this is the 
right solution for the ufuncobject.

One question we are pursuing is could the arrayobject get into the core 
without a particular ufunc object.   Most see this as sub-optimal, but 
maybe it is the only way.

Over the years there was pressure to add safety, convenience, 
flexibility, etc., all sometimes incompatible with speed. Numarray 
represents in some sense the set of compromises in that direction, 
besides its technical innovations. Numeric / Numeric3 represents the 
need for speed camp.
I don't see numarray as representing this at all.  To me, numarray 
represents the desire to have more flexible array types (specifically 
record arrays and maybe character arrays).   I personally don't see 
Numeric3 as in any kind of "need for speed" camp either.  I've never 
liked this distinction, because I don't think it represents a true 
dichotomy.  To me,  the differences between Numeric3 and numarray are 
currently more "architectural" than implementational.

Perhaps you are referring to the fact that because numarray has several 
portions written in Python it is "more flexible" or "more convenient" 
but slower for small arrays.  If you are saying that then I guess 
Numeric3 is a "need for speed" implementation, and I apologize for not 
understanding. 

I think it is reasonable to suppose that the need for speed piece can 
be wrapped suitably by the need for safety-flexibility-convenience 
facilities. I believe that hope underlies Travis' plan.
If the "safety-flexibility-convenience" facilities can be specified, 
then I'm all for one implementation.Numeric3 design goals do not go 
against any of these ideas intentionally.

The Nummies (the official set of developers) thought that the Numeric 
code base was an unsuitable basis for further development. There was 
no dissent about that at least. My idea was to get something like what 
Travis is now doing done to replace it. I felt it important to get 
myself out of the picture after five years as the lead developer 
especially since my day job had ceased to involve using Numeric.
Some of the parts needed to be re-written, but I didn't think that meant 
moving away from the goal to have a single C-type that is the 
arrayobject.   During this process Python 2.2 came out and allowed 
sub-classing from C-types.  As Perry mentioned, and I think needs to be 
emphasized again, this changed things as any benefit from having a 
Python-class for the final basic array type disappeared --- beyond ease 
of prototyping and testing.

However, removing my cork from the bottle released the unresolved 
pressure between these two camps. My plan for transition failed. I 
thought I had consensus on the goal and in fact it wasn't really 
there. Everyone is perfectly good-willed and clever and trying hard to 
"all just get along", but the goal was lost.  Eric Raymond should 
write a book about it called "Bumbled Bazaar".
This is an accurate description.  Fortunately, I don't think any 
ill-will exists (assuming I haven't created any with my recent 
activities).  I do want to "get-along."  I just don't want to be silent 
when there are issues that I think I understand.

I hope everyone will still try to achieve that goal. Interoperability 
of all the Numeric-related software (including supporting a 'default' 
plotting package) is required.
Utopia is always out of reach :-)
Aside: While I am at it, let me reiterate what I have said to the 
other developers privately: there is NO value to inheriting from the 
array class. Don't try to achieve that capability if it costs 
anything, even just effort, because it buys you nothing. Those of you 
who keep remarking on this as if it would simply haven't thought it 
through IMHO. It sounds so intellectually appealing that David Asc

Re: [Numpy-discussion] Re: [Python-Dev] Re: Numeric life as I see it

2005-02-09 Thread Travis Oliphant

[Travis]
 

I appreciate some of what Paul is saying here, but I'm not fully
convinced that this is still true with Python 2.2 and up new-style
c-types.   The concerns seem to be over the fact that you have to
re-implement everything in the sub-class because the base-class will
always return one of its objects instead of a sub-class object.
It seems to me, however,  that if the C methods use the object type
alloc function when creating new objects then some of this problem is
avoided (i.e. if the method is called with a sub-class type passed in,
then a sub-class type gets set).
   

This would severely constrain the __new__ method of the subclass.
 

I obviously don't understand the intricacies here, so fortunately it's 
not a key issue for me because I'm not betting the farm on being able to 
inherit from the arrayobject.  But, it is apparent that I don't 
understand all the issues.

Have you looked at how Python now allows sub-classing in C?  I'm not an
expert here, but it seems like a lot of the problems you were discussing
have been ameliorated.  There are probably still issues, but
I will know more when I seen what happens with a Matrix Object
inheriting from a Python C-array object.
   

And why would a Matrix need to inherit from a C-array? Wouldn't it
make more sense from an OO POV for the Matrix to *have* a C-array
without *being* one?
 

The only reason I'm thinking of here is to have it inherit from the 
C-array many of the default methods without having to implement them all 
itself.   I think Paul is saying that this never works with C-types like 
arrays, and I guess from your comments you agree with him.

The only real reason for wanting to construct a separate Matrix object 
is the need to overload the * operation to do matrix multiplication 
instead of element-by-element multiplication. 

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: [Numpy-discussion] Re: Numeric life as I see it

2005-02-10 Thread Travis Oliphant

One question we are pursuing is could the arrayobject get into the  
core without a particular ufunc object.   Most see this as  
sub-optimal, but maybe it is the only way.

Since all the artithmetic operations are in ufunc that would be  
suboptimal solution, but indeed still a workable one.

I think replacing basic number operations of the arrayobject should 
simple, so perhaps a default ufunc object could be worked out for 
inclusion.


I appreciate some of what Paul is saying here, but I'm not fully  
convinced that this is still true with Python 2.2 and up new-style  
c-types.   The concerns seem to be over the fact that you have to  
re-implement everything in the sub-class because the base-class will  
always return one of its objects instead of a sub-class object.

I'd say that such discussions should be postponed until someone  
proposes a good use for subclassing arrays. Matrices are not one, in 
my  opinion.

Agreed.  It is is not critical to what I am doing, and I obviously need 
more understanding before tackling such things.  Numeric3 uses the new 
c-type largely because of the nice getsets table which is separate from 
the methods table.  This replaces the rather ugly C-functions getattr 
and setattr.

-Travis
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used

2005-02-18 Thread Travis Oliphant
Hello again,
There is a great discussion going on the numpy list regarding a proposed 
PEP for multidimensional arrays that is in the works.

During this discussion as resurfaced regarding slicing with objects that 
are not IntegerType objects but that
have a tp_as_number->nb_int method to convert to an int. 

Would it be possible to change
_PyEval_SliceIndex  in ceval.c
so that rather than throwing an error if the indexing object is not an 
integer, the code first checks to see if the object has a
tp_as_number->nb_int method and calls it instead.

If this is acceptable, it is an easy patch.
Thanks,
-Travis Oliphant
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Fix _PyEval_SliceIndex (Take two)

2005-02-18 Thread Travis Oliphant
(More readable second paragraph)
Hello again,
There is a great discussion going on the numpy list regarding a proposed 
PEP for multidimensional arrays that is in the works.

During this discussion a problem has resurfaced regarding slicing with 
objects that are not IntegerType objects but that have a 
tp_as_number->nb_int method. Would it be possible to change

_PyEval_SliceIndex  in ceval.c
so that rather than raising an exception if the indexing object is not 
an integer, the code first checks to see if the object has a 
tp_as_number->nb_int method and trys it before raising an exception.

If this is acceptable, it is an easy patch.
Thanks,
-Travis Oliphant
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used

2005-02-18 Thread Travis Oliphant
Guido van Rossum wrote:
Would it be possible to change
_PyEval_SliceIndex  in ceval.c
so that rather than throwing an error if the indexing object is not an
integer, the code first checks to see if the object has a
tp_as_number->nb_int method and calls it instead.
   

I don't think this is the right solution; since float has that method,
it would allow floats to be used as slice indices, 
 

O.K.,
then how about if arrayobjects can make it in the core, then a check for 
a rank-0 integer-type
arrayobject is allowed before raising an exception?

-Travis
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Using descriptors to dynamically attach methods written in Python to C-defined (new-style) types

2005-03-25 Thread Travis Oliphant
In updating Numeric to take advantage of the new features in Python, 
I've come across the need
to attach a Python-written function as a method to a C-builtin.  I don't 
want to inherit, I just want to extend the methods of a builtin type 
using a Python function.   I was thinking of updating the new type 
objects dictionary with a new entry that is a descriptor object.

It seems that the descriptor mechanism makes this a relatively 
straightforward thing.  My question is, can I use the already-available 
Descriptor objects to do this, or will I need to define another 
Descriptor object.  (Perhaps a PythonMethod descriptor object to 
complement the Method Descriptor). 

Any hints will be helpful.  

-Travis Oliphant

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] 64-bit sequence and buffer protocol

2005-03-29 Thread Travis Oliphant
I'm posting to this list to again generate open discussion on the 
problem in current Python that an int is used in both the Python 
sequence protocol and the Python buffer protocol. 

The problem is that a C-int is typically only 4 bytes long while there 
are many applications (mmap for example), that would like to access 
sequences much larger than can be addressed with 32 bits.   There are 
two aspects to this problem:

1) Some 64-bit systems still define an C-int as 4 bytes long (so even 
in-memory sequence objects could not be addressed using the sequence 
protocol).

2) Even 32-bit systems have occasion to sequence a more abstract object 
(perhaps it is not all in memory) which requires more than 32 bits to 
address. 

These are the solutions I've seen:
1) Convert all C-ints to Py_LONG_LONG in the sequence and buffer protocols.
2) Add new C-API's that mirror the current ones which use Py_LONG_LONG 
instead of the current int.

3) Change Python to use the mapping protocol first (even for slicing) 
when both the mapping and sequence protocols are defined.

4) Tell writers of such large objects to not use the sequence and/or 
buffer protocols and instead use the mapping protocol and a different 
"bytes" object (that currently they would have to implement themselves 
and ignore the buffer protocol C-API).

What is the opinion of people on this list about how to fix the 
problem.   I believe Martin was looking at the problem and had told 
Perry Greenfield he was "fixing it."  Apparently at the recent PyCon, 
Perry and he talked and Martin said the problem is harder than he had 
initially thought.  It would be good to document what some of this 
problems are so that the community can assist in fixing this problem.

-Travis O.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Pickling buffer objects.

2005-04-18 Thread Travis Oliphant
Before submitting a patch to pickle.py and cPickle.c,  I'd be interested 
in knowing how likely to be accepted a patch that allows Python to 
pickle the buffer object.

The problem being solved is that Numeric currently has to copy all of 
its data into a string before writing it out to a pickle.  Yes, I know 
there are ways to write directly to a file.  But,  it is desireable to 
have Numeric arrays interact seamlessly with other pickleable types 
without a separate stream.   This is especially utilized for network 
transport. 

The patch would simply write the opcode for a Python string to the 
stream and then write the character-interpreted data (without making an 
intermediate copy) of the void * pointer of the buffer object. 

Yes, I know all of the old arguments about the buffer object and that it 
should be replaced with something better.I've read all the old posts 
and am quite familiar with the issues about it.

But, this can be considered a separate issue.  Since the buffer object 
exists, it ought to be pickleable, and it would make a lot of 
applications a lot faster.  

I'm proposing to pickle the buffer object so that it unpickles as a 
string.  Arguably, there should be a separate mutable-byte object opcode 
so that buffer objects unpickle as mutable-byte buffer objects.   If 
that is more desireable, I'd even offer a patch to do that (though such 
pickles wouldn't unpickle under earlier versions of Python).   I suspect 
that the buffer object would need to be reworked into something more 
along the lines of the previously-proposed bytes object before a 
separate bytecode for pickleable mutable-bytes is accepted, however.

-Travis Oliphant
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pickling buffer objects.

2005-04-19 Thread Travis Oliphant
Greg Ewing wrote:
Travis Oliphant wrote:
I'm proposing to pickle the buffer object so that it unpickles as a 
string.

Wouldn't this mean you're only solving half the problem?
Unpickling a Numeric array this way would still use an
intermediate string.

Well, actually, unpickling in the new numeric uses the intermediate 
string as the memory (yes, I know it's not supposed to be "mutable", but 
without a mutable bytes object what else are you supposed to do?). 

Thus, ideally we would have a mutable-bytes object with a separate 
pickle opcode.  Without this, then we overuse the string object.  But, 
since the string is only created by the pickle (and nobody else uses it, 
then what's the real harm).

So, in reality the previously-mentioned patch together with 
modificiations to Numeric's unpickling code actually solves the whole 
problem.

-Travis
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Please comment on PEP 357 -- adding nb_index slot to PyNumberMethods

2006-02-17 Thread Travis Oliphant
Thomas Wouters wrote:
> On Fri, Feb 17, 2006 at 05:29:32PM +0100, Armin Rigo wrote:
> 
>>>   Where obj must be either an int or a long or another object that has 
>>> the
>>>   __index__ special method (but not self).
> 
> 
>>The "anything but not self" rule is not consistent with any other
>>special method's behavior.  IMHO we should just do the same as
>>__nonzero__():

Agreed.  I implemented the code, then realized this possible recursion 
problem while writing the specification.  I didn't know how it would be 
viewed.

It is easy enough to require __index__ to return an actual Python 
integer because for anything that has the nb_index slot you would just 
return obj.__index__()  instead of obj.

I'll change the PEP and the implementation.  I have an updated 
implementation that uses the ssize_t patch instead.

There seem to be some issues with the ssize_t patch still, though.

Shouldn't a lot of checks for INT_MAX be replaced with PY_SSIZE_T_MAX. 
But, I noticed that PY_SSIZE_T_MAX definition in pyport.h raises errors.
I don't think it even makes sense.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ssize_t branch merged

2006-02-17 Thread Travis Oliphant
Martin v. Löwis wrote:
> Just in case you haven't noticed, I just merged
> the ssize_t branch (PEP 353).
> 
> If you have any corrections to the code to make which
> you would consider bug fixes, just go ahead.
> 
> If you are uncertain how specific problems should be resolved,
> feel free to ask.
> 
> If you think certain API changes should be made, please
> discuss them here - they would need to be reflected in the
> PEP as well.

What is PY_SSIZE_T_MAX supposed to be?  The definition in pyport.h 
doesn't compile.

Shouldn't a lot of checks for INT_MAX be replaced with PY_SSIZE_T_MAX? 
Like in the slice indexing code?

Thanks for all your effort on ssize_t fixing.  This is a *big* deal for 
64-bit number crunching with Python.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ssize_t branch merged

2006-02-17 Thread Travis Oliphant
Thomas Wouters wrote:
> On Fri, Feb 17, 2006 at 04:40:08PM -0700, Travis Oliphant wrote:
> 
> 
>>What is PY_SSIZE_T_MAX supposed to be?  The definition in pyport.h 
>>doesn't compile.
>

Maybe I have the wrong version of code.  In my pyport.h (checked out 
from svn trunk) I have.

#define PY_SSIZE_T_MAX ((Py_ssize_t)(((size_t)-1)>>1))

What is size_t?  Is this supposed to be sizeof(size_t)?

I get a syntax error when I actually use PY_SSIZE_T_MAX somewhere in the 
code.

> While looking at the piece of code in Include/pyport.h I do notice that the
> fallback (when ssize_t is not available) is to Py_uintptr_t... Which is an
> unsigned type, while ssize_t is supposed to be signed. Martin, is that on
> purpose? I don't have any systems that lack ssize_t. ;P

I saw the same thing and figured it was an error.

> 
> Adapting all code in the right way isn't finished yet (not in the last place
> because some of the code is... how shall I put it... 'creative'.)

I'm just trying to adapt my __index__ patch to use ssize_t.   I realize 
this was a big change and will take some "adjusting."  I can help with 
that if needed as I do have some experience here.  I just want to make 
sure I fully understand what issues Martin and others are concerned about.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Expose the array interface in Python 2.5?

2006-03-17 Thread Travis Oliphant
Edward C. Jones wrote:
> "Travis E. Oliphant" <[EMAIL PROTECTED]> wrote:
> 
>  > It is very important for many people to get access to memory with some
>  > description of how that memory should be interpreted as an array.
>  > Several Python packages could benefit if Python had some notion of an
>  > array interface that was at least supported in a duck-typing fashion.
> 
> Which packages? Which people? Which constituencies?
> 

I think I spell it out later.  Do you really need to argue about whether 
or not an array interface is a useful thing?  I thought we were beyond 
that and to the point of trying to figure out how to get the many groups 
to agree at least on a common interface.

> "Travis E. Oliphant" <[EMAIL PROTECTED]> also wrote:
> 
>  > My big quest is to get PIL, PyVox, WxPython, PyOpenGL, and so forth to
>  > be able to use the same interface.  Blessing the interface by
>  > including it in the Python core would help.  I'm also just wanting
>  > people in py-dev to get the concept of an array interface on their
>  > radar, as discussions of new bytes types emerges.
> 
> I use PIL and numarray a lot. It would be nice if they used a common 
> array format so I would not need to convert back and forth. But I 
> survive quite well with the current arrangement.
>

We all "survive",  but saying it is "quite well" is a bit optimistic as 
it means many very useful applications are harder to write than they 
really need to be.

> Many other packages besides PIL and Numeric / numarray / Numpy are 
> involved here: byte, struct, ctypes, SWIG, PyTables, Psyco, PyPy, Pyrex, 
> etc. There are some major issues that need to be dealt with which I will 

Sure they are involved, but I would argue the other ones you list care 
less about the multidimensional aspect of the array interface. 
(Actually PyTables just uses NumPy and so it should not be discussed as 
a "separate" package --- i.e. PyTables already tries to get along with 
NumPy as do many other packages...)

> 
> A data structure without an API and thorough documentation is useless. 
> Any proposal needs to include them from the very start.

Again, I restate.  The Numeric structure has been documented and has 
been around for a *long* time.  I'm just trying to get this basic 
interface into Python as a very simple object.  Let's not try to make it 
so complicated that no body can agree on what it should do.  To be 
specific, I want to see a type object with almost none of the Type 
structure filled in with specific behavior.

I'm mainly interested in an array structure that other packages can rely 
on (and inherit from if they so choose).

Because the C-structure of the Numeric PyArrayObject (which NumPy also 
uses) is so widely known and used and documented for over 10 years, I 
argue it ought to form the foundation for this simple Python object.

We can argue about explicit multidimensional indexing behavior, but to 
hold hostage the introduction of a simple inheritable object to 
disagreements about those more complicated issues seems to be missing 
the mark.

> 
> How should Python interact with low level data? By "low level data" I 
> mean data as seen by C, C++, and FORTRAN as well as linear arrays of bytes.

This is already known about in Numeric.  That's what I'm saying. 
Numeric handles this well, let's just bring over this basic memory model 
for an array over to Python itself and not worry about the other 
TypeObject function-pointer tables until later.

Everybody I talked to at SciPy was very enthused about this concept. 
There is a large number of people who don't read Python-dev that I'm 
speaking for here.

> 
> What changes in Python would make the packages listed above easier to 
> write and use? Easier enough to write that the package owners would be 
> willing to give up control of part of their packages.

They don't have to give up control if we just introduce a simple memory 
model for an array.


Thanks for your comments,


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] INPLACE_ADD and INPLACE_MULTIPLY oddities in ceval.c

2006-03-27 Thread Travis Oliphant

If you have Numeric or numpy installed try this:

#import Numeric as N
import numpy as N

a = range(10)
b = N.arange(10)

a.__iadd__(b)

print a

Result:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Contrast the returned output with

import numpy as N

a = range(10)
b = N.arange(10)

a += b

print a

Result:

[ 0  2  4  6  8 10 12 14 16 18]


Having "a+=b" and "a.__iadd__(b)" do different things seems like an 
unfortunate bug.

It seems to me that the problem is that the INPLACE_ADD and 
INPLACE_MULTIPLY cases in ceval.c  use PyNumber_InPlace instead of 
trying PySequence_InPlace when the object doesn't support the 
in-place number protocol.

I could submit a patch if there is agreement that this is a problem.

-Travis









___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 359: The "make" Statement

2006-04-13 Thread Travis Oliphant
Steven Bethard wrote:
> I know 2.5's not out yet, but since I now have a PEP number, I'm going
> to go ahead and post this for discussion.  Currently, the target
> version is Python 2.6.  You can also see the PEP at:
> http://www.python.org/dev/peps/pep-0359/
> 
> Thanks in advance for the feedback!

I generally like the idea.  A different name would be better.

Here's a list of approximate synonyms that might work (ordered by my 
preference...)

induce
compose
realize
furnish
produce

And others I found in no particular order...

invent
originate
organize
build
author
generate
construct
erect
concoct
coin
establish
instigate
trigger
offer


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] adding Construct to the standard library?

2006-04-18 Thread Travis Oliphant
Giovanni Bajo wrote:
> tomer filiba <[EMAIL PROTECTED]> wrote:
> 
> 
>>the point is -- ctypes can define C types. not the TCP/IP stack.
>>Construct can do both. it's a superset of ctype's typing mechanism.
>>but of course both have the right to *coexist* --
>>ctypes is oriented at interop with dlls, and provides the mechanisms
>>needed for that.
>>Construst is about data structures of all sorts and kinds.
>>
>>ctypes is a very helpful library as a builtin, and so is Construct.
>>the two don't compete on a spot in the stdlib.
> 
> 
> 
> I don't agree. Both ctypes and construct provide a way to describe a
> binary-packed structure in Python terms: and this is an overload of
> functionality. When I first saw Construct, the thing that crossed my head was:
> "hey, yet another syntax to describe a binary-packed structure in Python".
> ctypes uses its description to interoperate with native libraries, while
> Construct uses its to interoperate with binary protocols. I didn't see a good
> reason why you shouldn't extend ctypes so to provide features that it is
> currently missing. It looks like it could be easily extended to do so.
> 

For what it's worth,  NumPy also defines a data-type object which it 
uses to describe the fundamental data-type of an array.  In the context 
of this thread it is also yet another way to describe a binary-packed 
structure in Python.

This data-type object is a builtin object which provides information 
such as byte-order, element size, "kind" as well as the notion of fields 
so that nested structures can be easily defined.

Soon (over the next six months) a basic array object (a super class of 
NumPy) will be proposed for inclusion in Python.   When that happens 
some kind of data-type object (a super class of the NumPy dtype object) 
will be needed as well.

I think some cross-talk between all of us different users of the notion 
of what we in the NumPy community call a data-type might be useful.



-Travis Oliphant

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is PEP 237 final -- Unifying Long Integers and Integers

2005-07-14 Thread Travis Oliphant
Keith Dart wrote:

>On Sat, 18 Jun 2005, Michael Hudson wrote:
>
>  
>
>>The shortest way I know of going from 2149871625L to -2145095671 is
>>the still-fairly-gross:
>>
>>
>>
>>>>>v = 2149871625L
>>>>>~int(~v&0x)
>>>>>  
>>>>>
>>-2145095671
>>
>>
>>
>>>I suppose the best thing is to introduce an "unsignedint" type for this
>>>purpose.
>>>  
>>>
>>Or some kind of bitfield type, maybe.
>>
>>C uses integers both as bitfields and to count things, and at least in
>>my opinion the default assumption in Python should be that this is
>>what an integer is being used for, but when you need a bitfield it can
>>all get a bit horrible.
>>
>>That said, I think in this case we can just make fcntl_ioctl use the
>>(new-ish) 'I' format argument to PyArg_ParseTuple and then you'll just
>>be able to use 2149871625L and be happy (I think, haven't tried this).
>>
>>
>
>Thanks for the reply. I think I will go ahead and add some extension types 
>to Python. Thankfully, Python is extensible with new objects.
>
>It is also useful (to me, anyway) to be able to map, one to one,
>external primitives from other systems to Python primitives. For
>example, CORBA and SNMP have a set of types (signed ints, unsigned ints,
>etc.) defined that I would like to interface to Python (actually I have
>already done this to some degree). But Python makes it a bit more
>difficult without that one-to-one mapping of basic types.  Having an
>unsigned int type, for example, would make it easier to interface Python
>to SNMP or even some C libraries.
>
>In other words, Since the "Real World" has these types that I must
>sometimes interface to, it is useful to have these same (predictable)
>types in Python.
>
>So, it is worth extending the basic set of data types, and I will add it
>to my existing collection of Python extensions.
>
>Therefore, I would like to ask here if anyone has already started
>something like this? If not, I will go ahead and do it (if I have time).
>
>  
>

I should make you aware that the new Numeric (Numeric3 now called 
scipy.base) has a collection of C-types that represent each 
C-datatype.   They are (arguably) useful in the context of eliminating a 
few problems in data-type coercion in scientific computing. 

These types are created in C and use multiple inheritance in C.  This is 
very similiar to what you are proposing and so I thought I might make 
you aware.  Right now, the math operations from each of these types 
comes mostly from Numeric but this could be modified as desired. 

The code is available in the Numeric3 CVS tree at the numeric python 
sourceforge site.


-Travis Oliphant



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Why does __getitem__ slot of builtin call sequence methods first?

2005-10-01 Thread Travis Oliphant

The new ndarray object of scipy core (successor to Numeric Python) is a 
C extension type that has a getitem defined in both the as_mapping and 
the as_sequence structure. 

The as_sequence mapping is just so PySequence_GetItem will work correctly.

As exposed to Python the ndarray object as a .__getitem__  wrapper method.

Why does this wrapper call the sequence getitem instead of the mapping 
getitem method?

Is there anyway to get at a mapping-style __getitem__ method from Python?

This looks like a bug to me (which is why I'm posting here...)

Thanks for any help or insight.

-Travis Oliphant



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why does __getitem__ slot of builtin call sequence methods first?

2005-10-01 Thread Travis Oliphant
Guido van Rossum wrote:

>On 10/1/05, Travis Oliphant <[EMAIL PROTECTED]> wrote:
>  
>
>>The new ndarray object of scipy core (successor to Numeric Python) is a
>>C extension type that has a getitem defined in both the as_mapping and
>>the as_sequence structure.
>>
>>The as_sequence mapping is just so PySequence_GetItem will work correctly.
>>
>>As exposed to Python the ndarray object has a .__getitem__  wrapper method.
>>
>>Why does this wrapper call the sequence getitem instead of the mapping
>>getitem method?
>>
>>Is there anyway to get at a mapping-style __getitem__ method from Python?
>>
>>
>
>Hmm... I'm sure the answer is in typeobject.c, but that is one of the
>more obfuscated parts of Python's guts. I wrote it four years ago and
>since then I've apparently lost enough brain cells (or migrated them
>from language implementation to to language design service :) that I
>don't understand it inside out any more like I did while I was in the
>midst of it.
>
>However, I wonder if the logic isn't such that if you define both
>sq_item and mp_subscript, __getitem__ calls sq_item; I wonder if by
>removing sq_item it might call mp_subscript? Worth a try, anyway.
>
>  
>

Thanks for the tip.  I think I figured out the problem, and it was my 
misunderstanding of how types inherit in C that was the source of my 
problem.  

Basically, Python is doing what you would expect, the mp_item is used 
for __getitem__ if both mp_item and sq_item are present.  However, the 
addition of these descriptors  (and therefore the resolution of any 
comptetion for __getitem__ calls) is done  *before*  the inheritance of 
any slots takes place. 

The new ndarray object inherits from a "big" array object that doesn't 
define the sequence and buffer protocols (which have the size limiting 
int dependencing in their interfaces).   The ndarray object has standard 
tp_as_sequence and tp_as_buffer slots filled.  

Figuring the array object would inherit its tp_as_mapping protocol from 
"big" array (which it does just fine), I did not explicitly set that 
slot in its Type object.Thus, when PyType_Ready was called on the 
ndarray object, the tp_as_mapping was NULL and so __getitem__ mapped to 
the sequence-defined version.  Later the tp_as_mapping slots were 
inherited but too late for __getitem__ to be what I expected.

The easy fix was to initialize the tp_as_mapping slot before calling 
PyType_Ready.Hopefully, somebody else searching in the future for an 
answer to their problem will find this discussion useful.  

Thanks for your help,

-Travis




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Problems with the Python Memory Manager

2005-11-15 Thread Travis Oliphant

I know (thanks to Google) that much has been said in the past about the 
Python Memory Manager.  My purpose in posting is simply to given a 
use-case example of how the current memory manager (in Python 2.4.X) can 
be problematic in scientific/engineering code.

Scipy core is a replacement for Numeric.  One of the things scipy core 
does is define a new python scalar object for ever data type that an 
array can have (currently 21).   This has many advantages and is made 
feasible by the ability of Python to subtype in C.   These scalars all 
inherit from the standard Python types where there is a correspondence.

More to the point, however, these scalar objects were allocated using 
the standard PyObject_New and PyObject_Del functions which of course use 
the Python memory manager.One user ported his (long-running) code to 
the new scipy core and found much to his dismay that what used to 
consume around 100MB now completely dominated his machine consuming up 
to 2GB of memory after only a few iterations.  After searching many 
hours for memory leaks in scipy core (not a bad exercise anyway as some 
were found), the real problem was tracked to the fact that his code 
ended up creating and destroying many of these new array scalars.  

The Python memory manager was not reusing memory (even though 
PyObject_Del was being called).  I don't know enough about the memory 
manager to understand why that was happening.  However, changing the 
allocation from PyObject_New to malloc and from PyObject_Del to free, 
fixed the problems this user was seeing.   Now the code runs for a long 
time consuming only around 100MB at-a-time.

Thus, all of the objects in scipy core now use system malloc and system 
free for their memory needs.   Perhaps this is unfortunate, but it was 
the only solution I could see in the short term.

In the long term, what is the status of plans to re-work the Python 
Memory manager to free memory that it acquires (or improve the detection 
of already freed memory locations).  I see from other postings that this 
has been a problem for other people as well.   Also, is there a 
recommended way for dealing with this problem other than using system 
malloc and system free (or I suppose writing your own specialized memory 
manager).

Thanks for any feedback,


-Travis Oliphant


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-16 Thread Travis Oliphant
[EMAIL PROTECTED] wrote:

>Travis> More to the point, however, these scalar objects were allocated
>Travis> using the standard PyObject_New and PyObject_Del functions which
>Travis> of course use the Python memory manager.  One user ported his
>Travis> (long-running) code to the new scipy core and found much to his
>Travis> dismay that what used to consume around 100MB now completely
>Travis> dominated his machine consuming up to 2GB of memory after only a
>Travis> few iterations.  After searching many hours for memory leaks in
>Travis> scipy core (not a bad exercise anyway as some were found), the
>Travis> real problem was tracked to the fact that his code ended up
>Travis> creating and destroying many of these new array scalars.
>
>What Python object were his array elements a subclass of?
>  
>
These were all scipy core arrays.  The elements were therefore all 
C-like numbers (floats and integers I think).  If he obtained an element 
in Python, he would get an instance of a new "array" scalar object which 
is a builtin extension type written in C.  The important issue though is 
that these "array" scalars were allocated using PyObject_New and 
deallocated using PyObject_Del.  The problem is that the Python memory 
manager did not free the memory. 

>Travis> In the long term, what is the status of plans to re-work the
>Travis> Python Memory manager to free memory that it acquires (or
>Travis> improve the detection of already freed memory locations).  
>
>None that I'm aware of.  It's seen a great deal of work in the past and
>generally doesn't cause problems.  Maybe your user's usage patterns were
>a bad corner case.  It's hard to tell without more details.
>  
>
I think definitely, his usage pattern represented a "bad" corner case.  
An unusable "corner" case in fact.   At any rate, moving to use the 
system free and malloc fixed the immediate problem.  I mainly wanted to 
report the problem here just as another piece of anecdotal evidence.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-16 Thread Travis Oliphant
Josiah Carlson wrote:

>Robert Kern <[EMAIL PROTECTED]> wrote:
>  
>
>>[1] There *is* an array type for general PyObjects in scipy_core, but
>>that's not being used in the code that blows up and has nothing to do
>>with the problem Travis is talking about.
>>
>>
>
>I seemed to have misunderstood the discussion.  Was the original user
>accessing and saving copies of many millions of these doubles?  
>
He *was* accessing them (therefore generating a call to an array-scalar 
object creation function).  But they *weren't being* saved.  They were 
being deleted soon after access.   That's why it was so confusing that 
his memory usage should continue to grow and grow so terribly.

As verified by removing usage of the Python PyObject_MALLOC function, it 
was the Python memory manager that was performing poorly.   Even though 
the array-scalar objects were deleted, the memory manager would not 
re-use their memory for later object creation. Instead, the memory 
manager kept allocating new arenas to cover the load (when it should 
have been able to re-use the old memory that had been freed by the 
deleted objects--- again, I don't know enough about the memory manager 
to say why this happened).

The fact that it did happen is what I'm reporting on.  If nothing will 
be done about it (which I can understand), at least this thread might 
help somebody else in a similar situation track down why their Python 
process consumes all of their memory even though their objects are being 
deleted appropriately.

Best,

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >