Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Robert Bradshaw
On Sat, Apr 14, 2012 at 11:39 PM, Stefan Behnel  wrote:
> Robert Bradshaw, 15.04.2012 08:32:
>> On Sat, Apr 14, 2012 at 11:16 PM, Stefan Behnel wrote:
>>> Robert Bradshaw, 15.04.2012 07:59:
 On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote:
> There may be a lot of promotion/demotion (you likely only want the
> former) combinations, especially for multiple arguments, so perhaps it
> makes sense to limit ourselves a bit. For instance for numeric scalar
> argument types we could limit to long (and the unsigned counterparts),
> double and double complex.
>
> So char, short and int scalars will be
> promoted to long, float to double and float complex to double complex.
> Anything bigger, like long long etc will be matched specifically.
> Promotions and associated demotions if necessary in the callee should
> be fairly cheap compared to checking all combinations or going through
> the python layer.

 True, though this could be a convention rather than a requirement of
 the spec. Long vs. < long seems natural, but are there any systems
 where (scalar) float still has an advantage over double?

 Of course pointers like float* vs double* can't be promoted, so we
 would still need this kind of type declaration.
>>>
>>> Yes, passing data sets as C arrays requires proper knowledge about their
>>> memory layout on both sides.
>>>
>>> OTOH, we are talking about functions that would otherwise be called through
>>> Python, so this could only apply for buffers anyway. So why not require a
>>> Py_buffer* as argument for them?
>>
>> That's certainly our (initial?) usecase, but there's no need to limit
>> the protocol to this.
>
> I think the question here is: is this supposed to be a best effort protocol
> for bypassing Python calls, or would it be an error in some situations if
> no matching signature can be found?

It may be an error in some cases. This isn't just about avoiding
Python calls; Dag just barely summed this up quite nicely.

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Stefan Behnel
Dag Sverre Seljebotn, 15.04.2012 08:58:
> Ah, Cython objects. Didn't think of that. More below.
> 
> On 04/14/2012 11:02 PM, Stefan Behnel wrote:
>> thanks for writing this up. Comments inline as I read through it.
>>
>> Dag Sverre Seljebotn, 14.04.2012 21:08:
>>> each described by a function pointer and a signature specification
>>> string, such as "id)i" for {{{int f(int, double)}}}.
>>
>> How do we deal with object argument types? Do we care on the caller side?
>> Functions might have alternative signatures that differ in the type of
>> their object parameters. Or should we handle this inside of the caller and
>> expect that it's something like a fused function with internal dispatch in
>> that case?
>>
>> Personally, I think there is not enough to gain from object parameters that
>> we should handle it on the caller side. The callee can dispatch those if
>> necessary.
>>
>> What about signatures that require an object when we have a C typed value?
>>
>> What about signatures that require a C typed argument when we have an
>> arbitrary object value in our call parameters?
>>
>> We should also strip the "self" argument from the parameter list of
>> methods. That's handled by the attribute lookup before even getting at the
>> callable.
> 
> On 04/15/2012 07:59 AM, Robert Bradshaw wrote:
>> It would certainly be useful to have special syntax for memory views
>> (after nailing down a well-defined ABI for them) and builtin types.
>> Being able to declare something as taking a
>> "sage.rings.integer.Integer" could also prove useful, but could result
>> in long (and prefix-sharing) signatures, favoring the
>> runtime-allocated ids.
> 
> I do think describing Cython objects in this cross-tool CEP would work
> nicely, this is for standardized ABIs only (we can't do memoryviews either
> until their ABI is standard).

It just occurred to me that an object's type can safely be represented at
runtime as a pointer, i.e. an integer. Even if the type is heap allocated
and replaced by another one later, a signature that uses that pointer value
in its encoding would only ever match if both sides talk about the same
type at call time (because at least one of them would hold a life reference
to the type in order to actually use it).

That would mean that IDs for signatures with object arguments would have to
be generated at setup time, e.g. during module init, after importing the
respective type. But I think that's acceptable.


> I think I prefer to a) exclude it now, and b) down the line we need another
> cross-tool ABI to communicate vtables, and then we could put that into this
> CEP now.
> 
> I strongly believe we should go with the Go "duck-typing" approach for
> interfaces, i.e. it is not the declared name that should be compared but
> the method names and signatures.
> 
> The only question that needs answering for CEP1000 is: Would this blow up
> the signature string enough that interning is the only viable option?

That sounds excessive to me. Why would you want to test interfaces of
arguments as part of the signature matching? Isn't that something that the
callee should do when it actually needs a specific interface internally?

Is there an important use case for passing objects with different
interfaces as the same argument into the same callable? At least, it
doesn't sound like such a use case would be performance critical in terms
of the call overhead.

Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Robert Bradshaw
On Sat, Apr 14, 2012 at 11:58 PM, Dag Sverre Seljebotn
 wrote:
> Ah, Cython objects. Didn't think of that. More below.
>
>
> On 04/14/2012 11:02 PM, Stefan Behnel wrote:
>>
>> Hi,
>>
>> thanks for writing this up. Comments inline as I read through it.
>>
>> Dag Sverre Seljebotn, 14.04.2012 21:08:
>>>
>>> each described by a function pointer and a signature specification
>>>
>>> string, such as "id)i" for {{{int f(int, double)}}}.
>>
>>
>> How do we deal with object argument types? Do we care on the caller side?
>> Functions might have alternative signatures that differ in the type of
>> their object parameters. Or should we handle this inside of the caller and
>> expect that it's something like a fused function with internal dispatch in
>> that case?
>
>>
>> Personally, I think there is not enough to gain from object parameters
>> that
>> we should handle it on the caller side. The callee can dispatch those if
>> necessary.
>>
>> What about signatures that require an object when we have a C typed value?
>>
>> What about signatures that require a C typed argument when we have an
>> arbitrary object value in our call parameters?
>>
>> We should also strip the "self" argument from the parameter list of
>> methods. That's handled by the attribute lookup before even getting at the
>> callable.
>
> On 04/15/2012 07:59 AM, Robert Bradshaw wrote:
>> It would certainly be useful to have special syntax for memory views
>> (after nailing down a well-defined ABI for them) and builtin types.
>> Being able to declare something as taking a
>> "sage.rings.integer.Integer" could also prove useful, but could result
>> in long (and prefix-sharing) signatures, favoring the
>> runtime-allocated ids.
>
>
> I do think describing Cython objects in this cross-tool CEP would work
> nicely, this is for standardized ABIs only (we can't do memoryviews either
> until their ABI is standard).
>
> I think I prefer to a) exclude it now, and b) down the line we need another
> cross-tool ABI to communicate vtables, and then we could put that into this
> CEP now.
>
> I strongly believe we should go with the Go "duck-typing" approach for
> interfaces, i.e. it is not the declared name that should be compared but the
> method names and signatures.
>
> The only question that needs answering for CEP1000 is: Would this blow up
> the signature string enough that interning is the only viable option?

Exactly.

> Some strcmp solutions:
>
>  a) Hash each vtable descriptor to 160-bits, and assume the hash is unique.
> Still, a couple of interfaces would blow up the signature string a lot.
>
>  b) Modify approach B in CEP 1000 to this: If it is longer than 160 bits,
> take a full cryptographic hash, and just assume there won't be hash
> collisions (like git does). This still saves for short signature strings,
> and avoids interning at the cost of doing 160-bit comparisons.
>
> Both of these require other ways at getting at the actual string data. But I
> still like b) above better than interning.

Requiring an implementation (or at least access too) a cryptographic
hash greatly complicates the spec. (On another note, even a simple
hash as a prefix might be useful to prevent a lot of false partial
matches, e.g. "sage.rings...") 160 * n bits starts to get large too
(and we'd have to twiddle them to insert/avoid a "dash" ever 16
bytes).

Here's a crazy thought: we could assume signatures like this are
"application specific." We can partition up portions of the signature
space to individual projects to compute however they want. Cython can
do this via interning for those signatures containing Cython types
(which is not an undue burden for anyone attempting to interoperate
with Cython types). For (some superset of) the basic C types we agree
on a common encoding and inline it.

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] [cython-users] Cython 0.16 RC 1

2012-04-15 Thread Robert Bradshaw
On Sat, Apr 14, 2012 at 9:14 PM, Al Danial  wrote:
> On Thu, Apr 12, 2012 at 7:38 AM, mark florisson 
> wrote:
>>
>> Yet another release candidate, this will hopefully be the last before
>> the 0.16 release. You can grab it from here:
>> http://wiki.cython.org/ReleaseNotes-0.16
>
>> If there are any problems, please let us know.
>
> I'm having the same problem ("Cannot convert 'PyObject *' to Python object",
> ref my posts at
> http://groups.google.com/group/cython-users/browse_thread/thread/d1a727e9d61f93b6#)
> on my code as with the release candidate 0.  The code builds and runs
> cleanly with 0.15.1.  To duplicate:
>
>  svn co http://pynastran.googlecode.com/svn/trunk/pyNastran/op4
>  cd op4
>  make clean ; make

Including the problematic line would have been helpful.

ndarray.base =  array_wrapper_RS

This is due to the Numpy 1.7 fix. I think we need to pull these
commits out for now: https://github.com/cython/cython/pull/112

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Dag Sverre Seljebotn

On 04/15/2012 09:30 AM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 15.04.2012 08:58:

Ah, Cython objects. Didn't think of that. More below.

On 04/14/2012 11:02 PM, Stefan Behnel wrote:

thanks for writing this up. Comments inline as I read through it.

Dag Sverre Seljebotn, 14.04.2012 21:08:

each described by a function pointer and a signature specification
string, such as "id)i" for {{{int f(int, double)}}}.


How do we deal with object argument types? Do we care on the caller side?
Functions might have alternative signatures that differ in the type of
their object parameters. Or should we handle this inside of the caller and
expect that it's something like a fused function with internal dispatch in
that case?

Personally, I think there is not enough to gain from object parameters that
we should handle it on the caller side. The callee can dispatch those if
necessary.

What about signatures that require an object when we have a C typed value?

What about signatures that require a C typed argument when we have an
arbitrary object value in our call parameters?

We should also strip the "self" argument from the parameter list of
methods. That's handled by the attribute lookup before even getting at the
callable.


On 04/15/2012 07:59 AM, Robert Bradshaw wrote:

It would certainly be useful to have special syntax for memory views
(after nailing down a well-defined ABI for them) and builtin types.
Being able to declare something as taking a
"sage.rings.integer.Integer" could also prove useful, but could result
in long (and prefix-sharing) signatures, favoring the
runtime-allocated ids.


I do think describing Cython objects in this cross-tool CEP would work
nicely, this is for standardized ABIs only (we can't do memoryviews either
until their ABI is standard).


It just occurred to me that an object's type can safely be represented at
runtime as a pointer, i.e. an integer. Even if the type is heap allocated
and replaced by another one later, a signature that uses that pointer value
in its encoding would only ever match if both sides talk about the same
type at call time (because at least one of them would hold a life reference
to the type in order to actually use it).


The missing piece here is that both me and Robert are huge fans of 
Go-style polymorphism. If you haven't read up on that I highly recommend 
it, basic idea is if you agree on method names and their signatures, you 
don't have to have access to the same interface declaration (you don't 
have to call the interface the same thing).


Guess we should let this rest for a few days and get back to it with 
some benchmarks; since all we need to solve in CEP1000 is interned vs. 
strcmp. I'll try to do that.


Dag
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Dag Sverre Seljebotn

On 04/15/2012 09:39 AM, Robert Bradshaw wrote:

On Sat, Apr 14, 2012 at 11:58 PM, Dag Sverre Seljebotn
  wrote:

Ah, Cython objects. Didn't think of that. More below.


On 04/14/2012 11:02 PM, Stefan Behnel wrote:


Hi,

thanks for writing this up. Comments inline as I read through it.

Dag Sverre Seljebotn, 14.04.2012 21:08:


each described by a function pointer and a signature specification

string, such as "id)i" for {{{int f(int, double)}}}.



How do we deal with object argument types? Do we care on the caller side?
Functions might have alternative signatures that differ in the type of
their object parameters. Or should we handle this inside of the caller and
expect that it's something like a fused function with internal dispatch in
that case?




Personally, I think there is not enough to gain from object parameters
that
we should handle it on the caller side. The callee can dispatch those if
necessary.

What about signatures that require an object when we have a C typed value?

What about signatures that require a C typed argument when we have an
arbitrary object value in our call parameters?

We should also strip the "self" argument from the parameter list of
methods. That's handled by the attribute lookup before even getting at the
callable.


On 04/15/2012 07:59 AM, Robert Bradshaw wrote:

It would certainly be useful to have special syntax for memory views
(after nailing down a well-defined ABI for them) and builtin types.
Being able to declare something as taking a
"sage.rings.integer.Integer" could also prove useful, but could result
in long (and prefix-sharing) signatures, favoring the
runtime-allocated ids.



I do think describing Cython objects in this cross-tool CEP would work
nicely, this is for standardized ABIs only (we can't do memoryviews either
until their ABI is standard).

I think I prefer to a) exclude it now, and b) down the line we need another
cross-tool ABI to communicate vtables, and then we could put that into this
CEP now.

I strongly believe we should go with the Go "duck-typing" approach for
interfaces, i.e. it is not the declared name that should be compared but the
method names and signatures.

The only question that needs answering for CEP1000 is: Would this blow up
the signature string enough that interning is the only viable option?


Exactly.


Some strcmp solutions:

  a) Hash each vtable descriptor to 160-bits, and assume the hash is unique.
Still, a couple of interfaces would blow up the signature string a lot.

  b) Modify approach B in CEP 1000 to this: If it is longer than 160 bits,
take a full cryptographic hash, and just assume there won't be hash
collisions (like git does). This still saves for short signature strings,
and avoids interning at the cost of doing 160-bit comparisons.

Both of these require other ways at getting at the actual string data. But I
still like b) above better than interning.


Requiring an implementation (or at least access too) a cryptographic
hash greatly complicates the spec. (On another note, even a simple
hash as a prefix might be useful to prevent a lot of false partial
matches, e.g. "sage.rings...") 160 * n bits starts to get large too
(and we'd have to twiddle them to insert/avoid a "dash" ever 16
bytes).


Do you really think it complicates the spec? SHA-1 is pretty standard, 
and Python ships with hashlib (the hashing part isn't performance critical).


I prefer hashing to string-interning as it can still be done 
compile-time etc. 160 bits isn't worse than the second-to-best strcmp 
case of a 256-bit function entry.


Shortening the hash to 120 bits (truncation) we could have a spec like this:

 - Short signature: [64 bit encoded signature. 64 bit funcptr]
 - Long signature: [64 bit hash, 64 bit pointer to full signature,
8 bit guard byte, 56 bits remaining hash,
64 bit funcptr]


Anyway: Looks like it's about time to do some benchmarks. I'll try to 
get around to it next week.



Dag
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Dag Sverre Seljebotn

On 04/15/2012 10:07 AM, Dag Sverre Seljebotn wrote:

On 04/15/2012 09:30 AM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 15.04.2012 08:58:

Ah, Cython objects. Didn't think of that. More below.

On 04/14/2012 11:02 PM, Stefan Behnel wrote:

thanks for writing this up. Comments inline as I read through it.

Dag Sverre Seljebotn, 14.04.2012 21:08:

each described by a function pointer and a signature specification
string, such as "id)i" for {{{int f(int, double)}}}.


How do we deal with object argument types? Do we care on the caller
side?
Functions might have alternative signatures that differ in the type of
their object parameters. Or should we handle this inside of the
caller and
expect that it's something like a fused function with internal
dispatch in
that case?

Personally, I think there is not enough to gain from object
parameters that
we should handle it on the caller side. The callee can dispatch
those if
necessary.

What about signatures that require an object when we have a C typed
value?

What about signatures that require a C typed argument when we have an
arbitrary object value in our call parameters?

We should also strip the "self" argument from the parameter list of
methods. That's handled by the attribute lookup before even getting
at the
callable.


On 04/15/2012 07:59 AM, Robert Bradshaw wrote:

It would certainly be useful to have special syntax for memory views
(after nailing down a well-defined ABI for them) and builtin types.
Being able to declare something as taking a
"sage.rings.integer.Integer" could also prove useful, but could result
in long (and prefix-sharing) signatures, favoring the
runtime-allocated ids.


I do think describing Cython objects in this cross-tool CEP would work
nicely, this is for standardized ABIs only (we can't do memoryviews
either
until their ABI is standard).


It just occurred to me that an object's type can safely be represented at
runtime as a pointer, i.e. an integer. Even if the type is heap allocated
and replaced by another one later, a signature that uses that pointer
value
in its encoding would only ever match if both sides talk about the same
type at call time (because at least one of them would hold a life
reference
to the type in order to actually use it).


The missing piece here is that both me and Robert are huge fans of
Go-style polymorphism. If you haven't read up on that I highly recommend
it, basic idea is if you agree on method names and their signatures, you
don't have to have access to the same interface declaration (you don't
have to call the interface the same thing).

Guess we should let this rest for a few days and get back to it with
some benchmarks; since all we need to solve in CEP1000 is interned vs.
strcmp. I'll try to do that.


Actually, Stefan's idea above is valid for Go-style interfaces too, just 
replace pointer with an interned string. Which is what Robert proposed too.


Dag
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Nathaniel Smith
On Sun, Apr 15, 2012 at 9:15 AM, Dag Sverre Seljebotn
 wrote:
> Do you really think it complicates the spec? SHA-1 is pretty standard, and
> Python ships with hashlib (the hashing part isn't performance critical).
>
> I prefer hashing to string-interning as it can still be done compile-time
> etc. 160 bits isn't worse than the second-to-best strcmp case of a 256-bit
> function entry.

If you're *so* set on compile-time calculation, one could also
accommodate these within the intern framework pretty easily. Any
PyString/PyBytes * will be aligned, which means the low bit will not
be set, which means there are at least 2**31 bit-patterns that will
never be used by a run-time interned string. So we could write down a
lookup table in the spec that assigns arbitrary, well-known numbers to
every common signature. "dd->d" is 1, "ii->i" is 2, etc. If you have
15 standard types, then you can assign such an id to every 0, 1, 2, 3,
4, 5, and 6 argument function with space left over.

And this could all be abstracted away inside the intern() function.
The only thing is that if you wanted to look at the characters in the
interned string, you'd have to call a disintern() function instead of
just following the pointer.

I still think all this stuff would be complexity for its own sake, though.

> Shortening the hash to 120 bits (truncation) we could have a spec like this:
>
>  - Short signature: [64 bit encoded signature. 64 bit funcptr]
>  - Long signature: [64 bit hash, 64 bit pointer to full signature,
>                    8 bit guard byte, 56 bits remaining hash,
>                    64 bit funcptr]

This is a fixed length encoding, so why does it need a guard byte?

BTW, the guard byte design in the last version of the CEP looks buggy
to me -- there's no guarantee that a valid pointer might not contain
the guard byte by accident. A solution would be to move the
to-be-continued byte (or bit) to the first word. This would also mean
that if you're looking for a one-word signature via switch(), you
won't hit signatures which have your signature as a prefix. In the
variable-length encoding with the lookup rule you suggested you'd also
want a second bit to mark the actual beginning of each structure, so
you don't get hits on the middle of structures.

> Anyway: Looks like it's about time to do some benchmarks. I'll try to get
> around to it next week.

 Agreed :-).

- N
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Nathaniel Smith
On Sun, Apr 15, 2012 at 9:07 AM, Dag Sverre Seljebotn
 wrote:
> On 04/15/2012 09:30 AM, Stefan Behnel wrote:
>>
>> Dag Sverre Seljebotn, 15.04.2012 08:58:
>>>
>>> Ah, Cython objects. Didn't think of that. More below.
>>>
>>> On 04/14/2012 11:02 PM, Stefan Behnel wrote:

 thanks for writing this up. Comments inline as I read through it.

 Dag Sverre Seljebotn, 14.04.2012 21:08:
>
> each described by a function pointer and a signature specification
> string, such as "id)i" for {{{int f(int, double)}}}.


 How do we deal with object argument types? Do we care on the caller
 side?
 Functions might have alternative signatures that differ in the type of
 their object parameters. Or should we handle this inside of the caller
 and
 expect that it's something like a fused function with internal dispatch
 in
 that case?

 Personally, I think there is not enough to gain from object parameters
 that
 we should handle it on the caller side. The callee can dispatch those if
 necessary.

 What about signatures that require an object when we have a C typed
 value?

 What about signatures that require a C typed argument when we have an
 arbitrary object value in our call parameters?

 We should also strip the "self" argument from the parameter list of
 methods. That's handled by the attribute lookup before even getting at
 the
 callable.
>>>
>>>
>>> On 04/15/2012 07:59 AM, Robert Bradshaw wrote:

 It would certainly be useful to have special syntax for memory views
 (after nailing down a well-defined ABI for them) and builtin types.
 Being able to declare something as taking a
 "sage.rings.integer.Integer" could also prove useful, but could result
 in long (and prefix-sharing) signatures, favoring the
 runtime-allocated ids.
>>>
>>>
>>> I do think describing Cython objects in this cross-tool CEP would work
>>> nicely, this is for standardized ABIs only (we can't do memoryviews
>>> either
>>> until their ABI is standard).
>>
>>
>> It just occurred to me that an object's type can safely be represented at
>> runtime as a pointer, i.e. an integer. Even if the type is heap allocated
>> and replaced by another one later, a signature that uses that pointer
>> value
>> in its encoding would only ever match if both sides talk about the same
>> type at call time (because at least one of them would hold a life
>> reference
>> to the type in order to actually use it).
>
>
> The missing piece here is that both me and Robert are huge fans of Go-style
> polymorphism. If you haven't read up on that I highly recommend it, basic
> idea is if you agree on method names and their signatures, you don't have to
> have access to the same interface declaration (you don't have to call the
> interface the same thing).

Go style polymorphism is certainly a neat idea, but two points:

- You can't do this kind of matching via signature comparison. If I
have a type with methods "foo", "bar" and "baz", then that should
match the interface {"foo", "bar", "baz"}, but also {"foo", "bar"},
{"foo", "baz"}, {"bar"}, {}, etc. To find the right function for such
a type, you need to decode each function signature and check them in
some structured way. Unless your plan is to precompute the hash of all
2**n interfaces that each object fulfills.

- Adding a whole new type system with polymorphic dispatch is a heck
of a thing to do in a spec for boxing and unboxing pointers. Honestly
at this level I'm even leery of describing Python objects via their
type, as opposed to just "PyObject *". Just let the callee do the type
checking if they need to, and if it later turns out that there are
actually enough cases where Cython knows the exact type at compile
time and is dispatching through a boxed pointer and the callee type
checking is significant overhead, then extend the spec then.

-- Nathaniel
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Dag Sverre Seljebotn


Nathaniel Smith  wrote:

>On Sun, Apr 15, 2012 at 9:15 AM, Dag Sverre Seljebotn
> wrote:
>> Do you really think it complicates the spec? SHA-1 is pretty
>standard, and
>> Python ships with hashlib (the hashing part isn't performance
>critical).
>>
>> I prefer hashing to string-interning as it can still be done
>compile-time
>> etc. 160 bits isn't worse than the second-to-best strcmp case of a
>256-bit
>> function entry.
>
>If you're *so* set on compile-time calculation, one could also
>accommodate these within the intern framework pretty easily. Any
>PyString/PyBytes * will be aligned, which means the low bit will not
>be set, which means there are at least 2**31 bit-patterns that will
>never be used by a run-time interned string. So we could write down a
>lookup table in the spec that assigns arbitrary, well-known numbers to
>every common signature. "dd->d" is 1, "ii->i" is 2, etc. If you have
>15 standard types, then you can assign such an id to every 0, 1, 2, 3,
>4, 5, and 6 argument function with space left over.
>
>And this could all be abstracted away inside the intern() function.
>The only thing is that if you wanted to look at the characters in the
>interned string, you'd have to call a disintern() function instead of
>just following the pointer.
>
>I still think all this stuff would be complexity for its own sake,
>though.
>
>> Shortening the hash to 120 bits (truncation) we could have a spec
>like this:
>>
>>  - Short signature: [64 bit encoded signature. 64 bit funcptr]
>>  - Long signature: [64 bit hash, 64 bit pointer to full signature,
>>                    8 bit guard byte, 56 bits remaining hash,
>>                    64 bit funcptr]
>
>This is a fixed length encoding, so why does it need a guard byte?

No, there is two cases, one 128 bit and one 256 bit.

>
>BTW, the guard byte design in the last version of the CEP looks buggy
>to me -- there's no guarantee that a valid pointer might not contain
>the guard byte by accident. A solution would be to move the

In the CEP text some posts ago? I am pretty sure I made sure that pointers 
would never be looked at -- you are supposed to scan in 128 bit jumps and will 
never look at the beginning of a pointer. Read it again and see if you can make 
a counterexample...

That is the reason the above works, and why I split the hash in two segments.


>to-be-continued byte (or bit) to the first word. This would also mean
>that if you're looking for a one-word signature via switch(), you
>won't hit signatures which have your signature as a prefix. In the

You need 0-termination to be part of the signature (and if the 0 spills over, 
you spill over).

I should have said that, good catch.

Dag

>variable-length encoding with the lookup rule you suggested you'd also
>want a second bit to mark the actual beginning of each structure, so
>you don't get hits on the middle of structures.
>
>> Anyway: Looks like it's about time to do some benchmarks. I'll try to
>get
>> around to it next week.
>
> Agreed :-).
>
>- N
>___
>cython-devel mailing list
>cython-devel@python.org
>http://mail.python.org/mailman/listinfo/cython-devel

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Dag Sverre Seljebotn


Nathaniel Smith  wrote:

>On Sun, Apr 15, 2012 at 9:07 AM, Dag Sverre Seljebotn
> wrote:
>> On 04/15/2012 09:30 AM, Stefan Behnel wrote:
>>>
>>> Dag Sverre Seljebotn, 15.04.2012 08:58:

 Ah, Cython objects. Didn't think of that. More below.

 On 04/14/2012 11:02 PM, Stefan Behnel wrote:
>
> thanks for writing this up. Comments inline as I read through it.
>
> Dag Sverre Seljebotn, 14.04.2012 21:08:
>>
>> each described by a function pointer and a signature
>specification
>> string, such as "id)i" for {{{int f(int, double)}}}.
>
>
> How do we deal with object argument types? Do we care on the
>caller
> side?
> Functions might have alternative signatures that differ in the
>type of
> their object parameters. Or should we handle this inside of the
>caller
> and
> expect that it's something like a fused function with internal
>dispatch
> in
> that case?
>
> Personally, I think there is not enough to gain from object
>parameters
> that
> we should handle it on the caller side. The callee can dispatch
>those if
> necessary.
>
> What about signatures that require an object when we have a C
>typed
> value?
>
> What about signatures that require a C typed argument when we have
>an
> arbitrary object value in our call parameters?
>
> We should also strip the "self" argument from the parameter list
>of
> methods. That's handled by the attribute lookup before even
>getting at
> the
> callable.


 On 04/15/2012 07:59 AM, Robert Bradshaw wrote:
>
> It would certainly be useful to have special syntax for memory
>views
> (after nailing down a well-defined ABI for them) and builtin
>types.
> Being able to declare something as taking a
> "sage.rings.integer.Integer" could also prove useful, but could
>result
> in long (and prefix-sharing) signatures, favoring the
> runtime-allocated ids.


 I do think describing Cython objects in this cross-tool CEP would
>work
 nicely, this is for standardized ABIs only (we can't do memoryviews
 either
 until their ABI is standard).
>>>
>>>
>>> It just occurred to me that an object's type can safely be
>represented at
>>> runtime as a pointer, i.e. an integer. Even if the type is heap
>allocated
>>> and replaced by another one later, a signature that uses that
>pointer
>>> value
>>> in its encoding would only ever match if both sides talk about the
>same
>>> type at call time (because at least one of them would hold a life
>>> reference
>>> to the type in order to actually use it).
>>
>>
>> The missing piece here is that both me and Robert are huge fans of
>Go-style
>> polymorphism. If you haven't read up on that I highly recommend it,
>basic
>> idea is if you agree on method names and their signatures, you don't
>have to
>> have access to the same interface declaration (you don't have to call
>the
>> interface the same thing).
>
>Go style polymorphism is certainly a neat idea, but two points:
>
>- You can't do this kind of matching via signature comparison. If I
>have a type with methods "foo", "bar" and "baz", then that should
>match the interface {"foo", "bar", "baz"}, but also {"foo", "bar"},
>{"foo", "baz"}, {"bar"}, {}, etc. To find the right function for such
>a type, you need to decode each function signature and check them in
>some structured way. Unless your plan is to precompute the hash of all
>2**n interfaces that each object fulfills.

You are of course right this needs a lot more thought.

>
>- Adding a whole new type system with polymorphic dispatch is a heck
>of a thing to do in a spec for boxing and unboxing pointers. Honestly
>at this level I'm even leery of describing Python objects via their
>type, as opposed to just "PyObject *". Just let the callee do the type
>checking if they need to, and if it later turns out that there are
>actually enough cases where Cython knows the exact type at compile
>time and is dispatching through a boxed pointer and the callee type
>checking is significant overhead, then extend the spec then.

We are not insane, it's been said several times this goes in a later spec. 
We're just trying to guess whether future developments would seriously impact 
intern vs. strcmp -- ie what a likely signature length is in the future. We 
make CEP1000 a simple spec, but spend some time to try to guess how it could be 
extended.

Dag


>
>-- Nathaniel
>___
>cython-devel mailing list
>cython-devel@python.org
>http://mail.python.org/mailman/listinfo/cython-devel

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread mark florisson
On 15 April 2012 07:26, Stefan Behnel  wrote:
> mark florisson, 14.04.2012 23:15:
>> On 14 April 2012 22:02, Stefan Behnel wrote:
>>> Dag Sverre Seljebotn, 14.04.2012 21:08:
  * TBD: Support for Cython-specific constructs like memoryview slices
    (so that arrays with strides and shape can be passed faster than
    passing an {{{"O"}}}).
>>>
>>> Is this really Cython specific or would a generic Py_buffer struct work?
>>
>> That could work through simple unboxing wrapper functions, but it
>> would add some overhead, specifically because it would have to check
>> the buffer's object, and if it didn't exist or was not a memoryview
>> object, it would have to create one (checking whether something is a
>> memoryview object would also be a pain, as each module has a different
>> memoryview type). That could still be feasible for interaction with
>> Cython functions from non-Cython code.
>
> Hmm, I don't get it. Isn't the overhead always there when a memory view is
> requested in the signature? You'd have to create one for each call and that
> seriously hurts the efficiency. Is that a common use case? Why would you
> want to do more than passing unboxed buffers?
>
> Stefan
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel

So, if you're going to accept Py_buffer *buf (which is useful in
itself), then to use memoryviews you have to copy over some
shape/strides/suboffsets and the data pointer, which it not a big
deal. But you also want a memoryview object associated with the
memoryview slice, that keeps things around like the format string,
function pointers to convert the dtype to and from Python objects and
a reference (acquisition) count or a lock in case atomics are not
supported by the compiler (or Cython doesn't know about the compiler).
So if buf->obj is not a memoryview object, it will have to create one
in the callee, and the caller will have to convert a slice to a new
Py_buffer struct.

Arguably, the memoryview implementation is not optimal, it should have
a memoryview struct with that data, making it somewhat less expensive.

Finally, what are the semantics for Py_buffer? Will the callee own the
buffer, or will it borrow it? If they will borrow, then the compiler
will have to figure out whether it will need to own it (or be slower
and always own it), and acquire the buffer through buf->obj. At least
it won't have to validate the buffer, which is the most expensive
part.
I think in many cases you want to borrow though, but if you want to
always own, the caller could do something more efficient if
releasebuffer is not implemented, like simply incref buf->obj and pass
in a pointer to a copy of the Py_buffer. I think borrowing is probably
the easiest and most sane way though.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Stefan Behnel
mark florisson, 15.04.2012 13:30:
> Finally, what are the semantics for Py_buffer? Will the callee own the
> buffer, or will it borrow it? If they will borrow, then the compiler
> will have to figure out whether it will need to own it (or be slower
> and always own it), and acquire the buffer through buf->obj. At least
> it won't have to validate the buffer, which is the most expensive
> part.
> I think in many cases you want to borrow though, but if you want to
> always own, the caller could do something more efficient if
> releasebuffer is not implemented, like simply incref buf->obj and pass
> in a pointer to a copy of the Py_buffer. I think borrowing is probably
> the easiest and most sane way though.

I think that's easy. If you request and unpack a buffer yourself, you own
it. If you receive an unpacked buffer from someone else as a call argument,
you borrow it, and you know that your caller (or the caller of your caller,
etc.) owns it and keeps it alive until you return. If you receive it as
return value of a function call, it's less clear, but my intuition tells me
that you'd normally either receive an owned Python object or a borrowed
unpacked buffer.

In the case at hand, you'd always receive a borrowed buffer from the caller
as argument.

Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread mark florisson
On 15 April 2012 12:40, Stefan Behnel  wrote:
> mark florisson, 15.04.2012 13:30:
>> Finally, what are the semantics for Py_buffer? Will the callee own the
>> buffer, or will it borrow it? If they will borrow, then the compiler
>> will have to figure out whether it will need to own it (or be slower
>> and always own it), and acquire the buffer through buf->obj. At least
>> it won't have to validate the buffer, which is the most expensive
>> part.
>> I think in many cases you want to borrow though, but if you want to
>> always own, the caller could do something more efficient if
>> releasebuffer is not implemented, like simply incref buf->obj and pass
>> in a pointer to a copy of the Py_buffer. I think borrowing is probably
>> the easiest and most sane way though.
>
> I think that's easy. If you request and unpack a buffer yourself, you own
> it. If you receive an unpacked buffer from someone else as a call argument,
> you borrow it, and you know that your caller (or the caller of your caller,
> etc.) owns it and keeps it alive until you return. If you receive it as
> return value of a function call, it's less clear, but my intuition tells me
> that you'd normally either receive an owned Python object or a borrowed
> unpacked buffer.
>
> In the case at hand, you'd always receive a borrowed buffer from the caller
> as argument.
>
> Stefan
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel

That makes sense, but it means a lot of overhead for memoryview
slices, which I think justifies syntax for custom types in general.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] Cython 0.16 RC 2

2012-04-15 Thread mark florisson
Hopefully a final release candidate for the 0.16 release can be found
here: http://wiki.cython.org/ReleaseNotes-0.16 . This corresponds to
the 'release' branch of the cython repository on github.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Greg Ewing

Stefan Behnel wrote:


It wasn't really a proposed syntax, I guess, more of a way to write down an
example.


That's okay, although you might want to mention in the PEP
that the actual syntax is yet to be determined. Being a PEP,
anything it says tends to come across as being a specification
otherwise.

--
Greg
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP1000: Native dispatch through callables

2012-04-15 Thread Greg Ewing

Robert Bradshaw wrote:


Brevity, especially if the signature is inlined. (Encoding could take
care of this by, e.g. ignoring the redundant opening, or we could just
write di=d.)


Yes, I was thinking in terms of replacing the paren with
some other character, rather than inserting more parens.

--
Greg
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel