[Cython] Cython+PyPy benchmarks

2012-07-05 Thread Stefan Behnel
Hi,

I set up a Jenkins job to run a couple of (simple) benchmarks comparing
Cython's current performance under CPython and PyPy. Note that these are
C-API intensive benchmarks by design.

https://sage.math.washington.edu:8091/hudson/job/cython-devel-cybenchmarks-pypy/lastSuccessfulBuild/artifact/bench_chart.html

Basically, PyPy's cpyext is currently about 100-200x slower than CPython's
native C-API for these kinds of benchmarks. That's because it hasn't been
optimised in any way, correctness and completeness are still the main goals
in its development (and they're not there yet).

The one major performance issue in cpyext is currently the creation and
deallocation of the PyObject representation for each object, which
obviously has a huge impact on everything. I profiled the nbody benchmark
and it showed that almost 80% of the runtime is currently spent in creating
and discarding PyObject instances. Here's the call graph:

http://cython.org/callgrind-pypy-nbody.png

The up side of this is that there is likely a lot of low hanging fruit in
cpyext (plus some more tweaks in Cython), given that no optimisation at all
has been done so far. It shouldn't be too hard to drop the factor
substantially.

I also think we should add a couple of more C-ish benchmarks to see how
much overhead there really is for less C-API intensive code.

Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] [cython-users] C++: how to handle failures of 'new'?

2012-07-05 Thread mark florisson
On 3 July 2012 20:15, Robert Bradshaw  wrote:
> On Tue, Jul 3, 2012 at 11:43 AM, Dag Sverre Seljebotn
>  wrote:
>> On 07/03/2012 08:23 PM, Robert Bradshaw wrote:
>>>
>>> On Tue, Jul 3, 2012 at 11:11 AM, Stefan Behnel
>>> wrote:

 Robert Bradshaw, 03.07.2012 19:58:
>
> On Tue, Jul 3, 2012 at 9:38 AM, Stefan Behnel wrote:
>>
>> Dag Sverre Seljebotn, 03.07.2012 18:11:
>>>
>>> On 07/03/2012 09:14 AM, Stefan Behnel wrote:

 I don't know what happens if a C++ exception is not being caught, but
 I
 guess it would simply crash the application. That's a bit more
 visible than
>>>
>>>
>>> Yep.
>>>
 just printing a warning when a Python exception is being ignored due
 to a
 missing declaration. It's really unfortunate that our documentation
 didn't
 even mention the need for this, because it's not immediately obvious
 that
 Cython won't handle errors in "new", and testing for memory errors
 isn't
 quite what people commonly do in their test suites.

 Apart from that, I agree, users have to take care to properly declare
 the
 API they are using.
>>>
>>>
>>> Is there any time you do NOT want a "catch (...) {}" block? I can't
>>> see a
>>> C++ exception propagating to Python-land doing anything useful ever.
>>
>>
>> That would have been my intuition, too.
>
>
> If it's actually embedded, with the main driver in C++, one might want
> it to propagate up.


 But what kind of a propagation would that be? On the way out, it could
 induce anything, from side effects to resource leaks to crashes,
 depending
 on what the state of the surrounding code is. It would leave the whole
 system in an unpredictable state. I cannot imagine anyone really wanting
 this.


>>> So shouldn't we just make --cplus turn *all* external functions and
>>> methods
>>> (whether C-like or C++-like) into "except +"? (Or keep except+ for
>>> manual
>>> translation, but always have a catch(...)".
>>>
>>> Performance overhead is the only reason I can think of to not do this,
>>> although IIRC C++ catch blocks are only dealt with during stack
>>> unwinds and
>>> doesn't cost anything/much (?) when they're not triggered.
>>>
>>> "except -1" should then actually mean both; "except + except -1". So
>>> it's
>>> more a question of just adding catch(...) *everywhere*, than making
>>> "except
>>> +" the default.
>>
>>
>> I have no idea if there is a performance impact, but if there isn't,
>> always
>> catching all exceptions sounds like a reasonable thing to do. After
>> all, we
>> have no support for catching C++ exceptions on user side.
>
>
> This is a bit like following every C call with "except *" (though the
> performance ratios are unclear). It just seems a lot to wrap every
> single line of a non-trivial C++ using function with try..catch
> blocks.
>>
>>
>> It seems "a lot" of just what exactly? Generated code? Binary size? Time
>> spent in GCC parser?
>
> All of the above. And we should take a look at the runtime overhead
> (which is hopefully nil, but who knows.)
>
>> Though I guess one might want to try to pull out the try-catch to at least
>> only one per code line rather than one per SimpleCallNode.
>
> Or even higher, if possible. It's still a lot.

Why would you have to do that? Can't you just insert a try/catch per
try/except or try/finally block, or if absent, the function body. That
will still work with the way temporaries are cleaned up. (It should
also be implemented for parallel/prange sections).

>> "except *" only has a point when calling functions using the CPython API,
>> but most external C functions are pure C, not CPython-API-using-functions.
>> OTOH, all external C++ functions are C++ :-)
>
> Fair point.
>
>> (Also, if we wrote Cython from scratch now I'm pretty sure the "except *"
>> defaults would be a tad different.)
>
> For sure.
>
 But if users are correct about their declarations, we'd end up with the
 same thing. I think it's worth a try.
>>>
>>>
>>> Most C++ code (that I've ever run into) doesn't use exceptions,
>>> because exception handling is so broken in C++ anyways.
>>
>>
>> Except for the fact that any code touching "new" could be raising
>> exceptions? That propagates.
>
> I would guess most of the time people don't bother catching these and
> let the program die, as there's often no sane recovery (the same as
> MemoryErrors in Python, though I guess C++ is less often used from an
> event loop).
>
>> There is a lot of C++ code out there using exceptions. I'd guess that both
>> mathematical code and Google-written code is unlike most C++ code out there
>> :-) Many C++ programmers go on and on about RAII and auto_ptrs and so 

Re: [Cython] [cython-users] C++: how to handle failures of 'new'?

2012-07-05 Thread Stefan Behnel
mark florisson, 05.07.2012 20:47:
> On 3 July 2012 20:15, Robert Bradshaw wrote:
>> On Tue, Jul 3, 2012 at 11:43 AM, Dag Sverre Seljebotn
>>> Though I guess one might want to try to pull out the try-catch to at least
>>> only one per code line rather than one per SimpleCallNode.
>>
>> Or even higher, if possible. It's still a lot.
> 
> Why would you have to do that? Can't you just insert a try/catch per
> try/except or try/finally block, or if absent, the function body. That
> will still work with the way temporaries are cleaned up. (It should
> also be implemented for parallel/prange sections).

My first reaction was, "sure, smart idea". It certainly sounds like a good
idea to unify the exception handling between C++ and Python into the same
syntactic structures.

But does it allow to handle different declarations for multiple C++
functions that get called? E.g. "except +" for one and "except
+MemoryError" for another, but both called in the same try-whatever block?
That would just lead to nested try-except blocks, I guess, thus making the
outer exception clauses mostly a fallback for exceptions that users forgot
to declare or couldn't properly handle for some reason.

I think it's worth a try to see if it works.

BTW, is there a reason we can't allow users to declare C++ exceptions in
their .pxd files, and then support catching them in Python try-except
syntax? Just verbatimly translating them to the C++ structures, based on
the type of the exception that gets caught?

(Although, given the discussion so far, maybe try-finally is more important
than try-catch, and the former can't know when it needs to be mapped into
C++ code and when not ...)

Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] [cython-users] C++: how to handle failures of 'new'?

2012-07-05 Thread Dag Sverre Seljebotn


mark florisson  wrote:

>On 3 July 2012 20:15, Robert Bradshaw  wrote:
>> On Tue, Jul 3, 2012 at 11:43 AM, Dag Sverre Seljebotn
>>  wrote:
>>> On 07/03/2012 08:23 PM, Robert Bradshaw wrote:

 On Tue, Jul 3, 2012 at 11:11 AM, Stefan Behnel
 wrote:
>
> Robert Bradshaw, 03.07.2012 19:58:
>>
>> On Tue, Jul 3, 2012 at 9:38 AM, Stefan Behnel wrote:
>>>
>>> Dag Sverre Seljebotn, 03.07.2012 18:11:

 On 07/03/2012 09:14 AM, Stefan Behnel wrote:
>
> I don't know what happens if a C++ exception is not being
>caught, but
> I
> guess it would simply crash the application. That's a bit more
> visible than


 Yep.

> just printing a warning when a Python exception is being
>ignored due
> to a
> missing declaration. It's really unfortunate that our
>documentation
> didn't
> even mention the need for this, because it's not immediately
>obvious
> that
> Cython won't handle errors in "new", and testing for memory
>errors
> isn't
> quite what people commonly do in their test suites.
>
> Apart from that, I agree, users have to take care to properly
>declare
> the
> API they are using.


 Is there any time you do NOT want a "catch (...) {}" block? I
>can't
 see a
 C++ exception propagating to Python-land doing anything useful
>ever.
>>>
>>>
>>> That would have been my intuition, too.
>>
>>
>> If it's actually embedded, with the main driver in C++, one might
>want
>> it to propagate up.
>
>
> But what kind of a propagation would that be? On the way out, it
>could
> induce anything, from side effects to resource leaks to crashes,
> depending
> on what the state of the surrounding code is. It would leave the
>whole
> system in an unpredictable state. I cannot imagine anyone really
>wanting
> this.
>
>
 So shouldn't we just make --cplus turn *all* external functions
>and
 methods
 (whether C-like or C++-like) into "except +"? (Or keep except+
>for
 manual
 translation, but always have a catch(...)".

 Performance overhead is the only reason I can think of to not
>do this,
 although IIRC C++ catch blocks are only dealt with during stack
 unwinds and
 doesn't cost anything/much (?) when they're not triggered.

 "except -1" should then actually mean both; "except + except
>-1". So
 it's
 more a question of just adding catch(...) *everywhere*, than
>making
 "except
 +" the default.
>>>
>>>
>>> I have no idea if there is a performance impact, but if there
>isn't,
>>> always
>>> catching all exceptions sounds like a reasonable thing to do.
>After
>>> all, we
>>> have no support for catching C++ exceptions on user side.
>>
>>
>> This is a bit like following every C call with "except *" (though
>the
>> performance ratios are unclear). It just seems a lot to wrap
>every
>> single line of a non-trivial C++ using function with try..catch
>> blocks.
>>>
>>>
>>> It seems "a lot" of just what exactly? Generated code? Binary size?
>Time
>>> spent in GCC parser?
>>
>> All of the above. And we should take a look at the runtime overhead
>> (which is hopefully nil, but who knows.)
>>
>>> Though I guess one might want to try to pull out the try-catch to at
>least
>>> only one per code line rather than one per SimpleCallNode.
>>
>> Or even higher, if possible. It's still a lot.
>
>Why would you have to do that? Can't you just insert a try/catch per
>try/except or try/finally block, or if absent, the function body. That
>will still work with the way temporaries are cleaned up. (It should
>also be implemented for parallel/prange sections).

One disadvantage is that you don't get source code line for the .pyx file in 
the stack trace. Which is often exactly the information you are looking for 
(even worse, since C++ stack isn't in the stack trace, the lineno for what 
seems like the ' ultimate cause' is not there). Having to surround statements 
with try/except just to pinpoint which one is raising the exception would be 
incredibly irritating.

Dag

>
>>> "except *" only has a point when calling functions using the CPython
>API,
>>> but most external C functions are pure C, not
>CPython-API-using-functions.
>>> OTOH, all external C++ functions are C++ :-)
>>
>> Fair point.
>>
>>> (Also, if we wrote Cython from scratch now I'm pretty sure the
>"except *"
>>> defaults would be a tad different.)
>>
>> For sure.
>>
> But if users are correct about their declarations, we'd end up
>with the
> same thing. I think it's worth a try.


 Most C++ code (that I've ever run into) doesn't use exceptions,
 because exception handling is so

Re: [Cython] [cython-users] C++: how to handle failures of 'new'?

2012-07-05 Thread mark florisson
On 5 July 2012 21:46, Dag Sverre Seljebotn  wrote:
>
>
> mark florisson  wrote:
>
>>On 3 July 2012 20:15, Robert Bradshaw  wrote:
>>> On Tue, Jul 3, 2012 at 11:43 AM, Dag Sverre Seljebotn
>>>  wrote:
 On 07/03/2012 08:23 PM, Robert Bradshaw wrote:
>
> On Tue, Jul 3, 2012 at 11:11 AM, Stefan Behnel
> wrote:
>>
>> Robert Bradshaw, 03.07.2012 19:58:
>>>
>>> On Tue, Jul 3, 2012 at 9:38 AM, Stefan Behnel wrote:

 Dag Sverre Seljebotn, 03.07.2012 18:11:
>
> On 07/03/2012 09:14 AM, Stefan Behnel wrote:
>>
>> I don't know what happens if a C++ exception is not being
>>caught, but
>> I
>> guess it would simply crash the application. That's a bit more
>> visible than
>
>
> Yep.
>
>> just printing a warning when a Python exception is being
>>ignored due
>> to a
>> missing declaration. It's really unfortunate that our
>>documentation
>> didn't
>> even mention the need for this, because it's not immediately
>>obvious
>> that
>> Cython won't handle errors in "new", and testing for memory
>>errors
>> isn't
>> quite what people commonly do in their test suites.
>>
>> Apart from that, I agree, users have to take care to properly
>>declare
>> the
>> API they are using.
>
>
> Is there any time you do NOT want a "catch (...) {}" block? I
>>can't
> see a
> C++ exception propagating to Python-land doing anything useful
>>ever.


 That would have been my intuition, too.
>>>
>>>
>>> If it's actually embedded, with the main driver in C++, one might
>>want
>>> it to propagate up.
>>
>>
>> But what kind of a propagation would that be? On the way out, it
>>could
>> induce anything, from side effects to resource leaks to crashes,
>> depending
>> on what the state of the surrounding code is. It would leave the
>>whole
>> system in an unpredictable state. I cannot imagine anyone really
>>wanting
>> this.
>>
>>
> So shouldn't we just make --cplus turn *all* external functions
>>and
> methods
> (whether C-like or C++-like) into "except +"? (Or keep except+
>>for
> manual
> translation, but always have a catch(...)".
>
> Performance overhead is the only reason I can think of to not
>>do this,
> although IIRC C++ catch blocks are only dealt with during stack
> unwinds and
> doesn't cost anything/much (?) when they're not triggered.
>
> "except -1" should then actually mean both; "except + except
>>-1". So
> it's
> more a question of just adding catch(...) *everywhere*, than
>>making
> "except
> +" the default.


 I have no idea if there is a performance impact, but if there
>>isn't,
 always
 catching all exceptions sounds like a reasonable thing to do.
>>After
 all, we
 have no support for catching C++ exceptions on user side.
>>>
>>>
>>> This is a bit like following every C call with "except *" (though
>>the
>>> performance ratios are unclear). It just seems a lot to wrap
>>every
>>> single line of a non-trivial C++ using function with try..catch
>>> blocks.


 It seems "a lot" of just what exactly? Generated code? Binary size?
>>Time
 spent in GCC parser?
>>>
>>> All of the above. And we should take a look at the runtime overhead
>>> (which is hopefully nil, but who knows.)
>>>
 Though I guess one might want to try to pull out the try-catch to at
>>least
 only one per code line rather than one per SimpleCallNode.
>>>
>>> Or even higher, if possible. It's still a lot.
>>
>>Why would you have to do that? Can't you just insert a try/catch per
>>try/except or try/finally block, or if absent, the function body. That
>>will still work with the way temporaries are cleaned up. (It should
>>also be implemented for parallel/prange sections).
>
> One disadvantage is that you don't get source code line for the .pyx file in 
> the stack trace. Which is often exactly the information you are looking for 
> (even worse, since C++ stack isn't in the stack trace, the lineno for what 
> seems like the ' ultimate cause' is not there). Having to surround statements 
> with try/except just to pinpoint which one is raising the exception would be 
> incredibly irritating.
>
> Dag

Oh yeah, good point. Maybe we could use these zero-cost exceptions for
cdef functions in Cython though, instead of error checks (if it
appears to make any significant difference). Basically instead of the
'error' argument in CEP 526. It'd need version that ABI as well...

>>
 "except *" only has a point when calling functions using the CPython
>>API,
 but most external C functions are pure C, not

[Cython] Odd behavior with std::string and .decode()

2012-07-05 Thread Barry Warsaw
I'm currently exploring using Cython to provide new Python 3 bindings for
Xapian.  I'm pretty much a Cython n00b but the documentation is great, and I
was able to pretty quickly get something really simple working.  I'm using
Cython 0.15 on Ubuntu 12.04 with Python 3.2 and Xapian 1.2.12.  I've pushed my
current branch to github:

https://github.com/warsaw/xapian/tree/py3/xapian-bindings/python3

There you'll see my xapianlib.pxd and xapian.pyx files.

Where I'm seeing some odd behavior is in trying to expose the
Xapian::TermGenerator.get_description() method.  This returns a std::string
and I'm trying to create a `description` property that coerces this to unicode
before returning it to Python.  Here's the relevant code:

-snip snip-
cdef class TermGenerator:
cdef xapianlib.TermGenerator * _this

def __cinit__(self):
self._this = new xapianlib.TermGenerator()

def __dealloc__(self):
del self._this

property description:
def __get__(self):
as_bytes = self._this.get_description().c_str()
#return as_bytes
return as_bytes.decode('utf-8')
-snip snip-

I'm sure I'm doing something naive or stupid, but the problem is that
as written above, .description is returning nonsense.

% python
Python 3.2.3 (default, May  3 2012, 15:51:42) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xapian
>>> tg = xapian.TermGenerator()
>>> tg.description
'\x00\x00\x00\x00_\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

If instead, I return just the bytes object (i.e. what
.get_description().c_str() returns), then I get more like what I expect.

% python
Python 3.2.3 (default, May  3 2012, 15:51:42) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xapian
>>> tg = xapian.TermGenerator()
>>> tg.description
b'Xapian::TermGenerator(stem=Xapian::Stem(none), 
doc=Document(Xapian::Document::Internal()), termpos=0)'
>>> tg.description.decode('utf-8')
'Xapian::TermGenerator(stem=Xapian::Stem(none), 
doc=Document(Xapian::Document::Internal()), termpos=0)'

I looked at the generated code in the first example, but didn't really see
anything obvious.  There are no NULs in the char* description afaict.  I
haven't yet tested Cython 0.16 or 0.17 to see if this behaves differently.

Is this a bug or am I doing something stupid?

Cheers,
-Barry


signature.asc
Description: PGP signature
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] [cython-users] C++: how to handle failures of 'new'?

2012-07-05 Thread Dag Sverre Seljebotn


mark florisson  wrote:

>On 5 July 2012 21:46, Dag Sverre Seljebotn 
>wrote:
>>
>>
>> mark florisson  wrote:
>>
>>>On 3 July 2012 20:15, Robert Bradshaw  wrote:
 On Tue, Jul 3, 2012 at 11:43 AM, Dag Sverre Seljebotn
  wrote:
> On 07/03/2012 08:23 PM, Robert Bradshaw wrote:
>>
>> On Tue, Jul 3, 2012 at 11:11 AM, Stefan
>Behnel
>> wrote:
>>>
>>> Robert Bradshaw, 03.07.2012 19:58:

 On Tue, Jul 3, 2012 at 9:38 AM, Stefan Behnel wrote:
>
> Dag Sverre Seljebotn, 03.07.2012 18:11:
>>
>> On 07/03/2012 09:14 AM, Stefan Behnel wrote:
>>>
>>> I don't know what happens if a C++ exception is not being
>>>caught, but
>>> I
>>> guess it would simply crash the application. That's a bit
>more
>>> visible than
>>
>>
>> Yep.
>>
>>> just printing a warning when a Python exception is being
>>>ignored due
>>> to a
>>> missing declaration. It's really unfortunate that our
>>>documentation
>>> didn't
>>> even mention the need for this, because it's not immediately
>>>obvious
>>> that
>>> Cython won't handle errors in "new", and testing for memory
>>>errors
>>> isn't
>>> quite what people commonly do in their test suites.
>>>
>>> Apart from that, I agree, users have to take care to
>properly
>>>declare
>>> the
>>> API they are using.
>>
>>
>> Is there any time you do NOT want a "catch (...) {}" block? I
>>>can't
>> see a
>> C++ exception propagating to Python-land doing anything
>useful
>>>ever.
>
>
> That would have been my intuition, too.


 If it's actually embedded, with the main driver in C++, one
>might
>>>want
 it to propagate up.
>>>
>>>
>>> But what kind of a propagation would that be? On the way out, it
>>>could
>>> induce anything, from side effects to resource leaks to crashes,
>>> depending
>>> on what the state of the surrounding code is. It would leave the
>>>whole
>>> system in an unpredictable state. I cannot imagine anyone really
>>>wanting
>>> this.
>>>
>>>
>> So shouldn't we just make --cplus turn *all* external
>functions
>>>and
>> methods
>> (whether C-like or C++-like) into "except +"? (Or keep
>except+
>>>for
>> manual
>> translation, but always have a catch(...)".
>>
>> Performance overhead is the only reason I can think of to not
>>>do this,
>> although IIRC C++ catch blocks are only dealt with during
>stack
>> unwinds and
>> doesn't cost anything/much (?) when they're not triggered.
>>
>> "except -1" should then actually mean both; "except + except
>>>-1". So
>> it's
>> more a question of just adding catch(...) *everywhere*, than
>>>making
>> "except
>> +" the default.
>
>
> I have no idea if there is a performance impact, but if there
>>>isn't,
> always
> catching all exceptions sounds like a reasonable thing to do.
>>>After
> all, we
> have no support for catching C++ exceptions on user side.


 This is a bit like following every C call with "except *"
>(though
>>>the
 performance ratios are unclear). It just seems a lot to wrap
>>>every
 single line of a non-trivial C++ using function with try..catch
 blocks.
>
>
> It seems "a lot" of just what exactly? Generated code? Binary
>size?
>>>Time
> spent in GCC parser?

 All of the above. And we should take a look at the runtime overhead
 (which is hopefully nil, but who knows.)

> Though I guess one might want to try to pull out the try-catch to
>at
>>>least
> only one per code line rather than one per SimpleCallNode.

 Or even higher, if possible. It's still a lot.
>>>
>>>Why would you have to do that? Can't you just insert a try/catch per
>>>try/except or try/finally block, or if absent, the function body.
>That
>>>will still work with the way temporaries are cleaned up. (It should
>>>also be implemented for parallel/prange sections).
>>
>> One disadvantage is that you don't get source code line for the .pyx
>file in the stack trace. Which is often exactly the information you are
>looking for (even worse, since C++ stack isn't in the stack trace, the
>lineno for what seems like the ' ultimate cause' is not there). Having
>to surround statements with try/except just to pinpoint which one is
>raising the exception would be incredibly irritating.
>>
>> Dag
>
>Oh yeah, good point. Maybe we could use these zero-cost exceptions for
>cdef functions in Cython though, instead of error checks (if it
>appears to make any significant difference). Basically instead of the
>'error' argument in CEP 526. It'

Re: [Cython] Odd behavior with std::string and .decode()

2012-07-05 Thread Stefan Behnel
Hi Barry,

Barry Warsaw, 06.07.2012 00:29:
> I'm currently exploring using Cython to provide new Python 3 bindings for
> Xapian.  I'm pretty much a Cython n00b but the documentation is great, and I
> was able to pretty quickly get something really simple working.  I'm using
> Cython 0.15 on Ubuntu 12.04 with Python 3.2 and Xapian 1.2.12.  I've pushed my
> current branch to github:
> 
> https://github.com/warsaw/xapian/tree/py3/xapian-bindings/python3
> 
> There you'll see my xapianlib.pxd and xapian.pyx files.
> 
> Where I'm seeing some odd behavior is in trying to expose the
> Xapian::TermGenerator.get_description() method.  This returns a std::string
> and I'm trying to create a `description` property that coerces this to unicode
> before returning it to Python.  Here's the relevant code:
> 
> -snip snip-
> cdef class TermGenerator:
> cdef xapianlib.TermGenerator * _this
> 
> def __cinit__(self):
> self._this = new xapianlib.TermGenerator()
> 
> def __dealloc__(self):
> del self._this
> 
> property description:
> def __get__(self):
> as_bytes = self._this.get_description().c_str()
> #return as_bytes
> return as_bytes.decode('utf-8')
> -snip snip-
> 
> I'm sure I'm doing something naive or stupid, but the problem is that
> as written above, .description is returning nonsense.
> 
> % python
> Python 3.2.3 (default, May  3 2012, 15:51:42) 
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import xapian
> >>> tg = xapian.TermGenerator()
> >>> tg.description
> '\x00\x00\x00\x00_\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
> 
> If instead, I return just the bytes object (i.e. what
> .get_description().c_str() returns), then I get more like what I expect.
> 
> % python
> Python 3.2.3 (default, May  3 2012, 15:51:42) 
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import xapian
> >>> tg = xapian.TermGenerator()
> >>> tg.description
> b'Xapian::TermGenerator(stem=Xapian::Stem(none), 
> doc=Document(Xapian::Document::Internal()), termpos=0)'
> >>> tg.description.decode('utf-8')
> 'Xapian::TermGenerator(stem=Xapian::Stem(none), 
> doc=Document(Xapian::Document::Internal()), termpos=0)'

This is very weird behaviour indeed. I wouldn't know why that should
happen. What "return as_bytes.decode('utf-8')" does is that is calls
strlen() to see how long the string is, then it calls the UTF-8 decode
C-API function with that.

The string that get_description() returns is allocated internally in the
C++ object, right? So it can't suddenly die or something?

One thing I would generally suggest is to do this:

descr = self._this.get_description()
return descr.data()[:descr.size()].decode('utf-8')

Avoids the call to strlen() by explicitly slicing the pointer. Also avoids
needing to make sure the C string is 0-terminated.


> I looked at the generated code in the first example, but didn't really see
> anything obvious.  There are no NULs in the char* description afaict.  I
> haven't yet tested Cython 0.16 or 0.17 to see if this behaves differently.

I wouldn't know any differences out of the top of my head, except that 0.17
has generally better support for STL containers and std:string (but that's
unrelated to this failure). I'm planning to enable direct support for
cpp_string.decode(...) as well, but that's not implemented yet. It would
basically generate the verbose code above automatically.


> Is this a bug or am I doing something stupid?

Definitely not doing something stupid, but I have no idea why this should
go wrong.

Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel