Re: [Cython] Specializing str methods?

2011-05-19 Thread Stefan Behnel

Hi,

the right place to discuss this is the cython-devel mailing list.

John Ehresman, 19.05.2011 18:17:

On 4/13/11 11:23 AM, Stefan Behnel wrote:

You can add BuiltinMethod entries for the "str" type in Builtin.py and
map them to a function name like "__Pyx_Str_PyString_WhatEver()". Then,
define "utility_code" code blocks for each of the methods that #defines
that name to the appropriate PyString_*() or PyUnicode_*() C-API
function, depending on the Python version it runs with. There are a lot
of examples for that in Builtin.py already.


I've finally got back to this and have the start of an implementation. I've
attached a diff, though I'm not sure what the preferred workflow is. Should
I push to github and / or create a bug in trac?


It's best to create a pull request on github, potentially accompanied by a 
request for review on the developer mailing list. We get a notification 
about pull requests, but it's still better to give us an idea in what state 
you think your patch is and what kind of review you need. The review can 
then happen by commenting on the changes directly in github.




I'll probably add more
methods, but wanted to get feedback before doing so.


Certainly the right decision.



I also intend to contribute other patches in the future.


Please do. :)



One thing that I didn't expect is that I originally declared the method as
BuiltinMethod("startswith", "TT", "O", "_Pyx_StrStartsWith",
utility_code=str_startswith_utility_code),
and expected the specialization to only be used when both self and arg
where a str, but instead cython always used the specialization and added a
check for arg being a str before the specialization was called. I ended up
writing a function that works for both unicode & bytes self and an object arg.


You don't really have to care about the type of the argument when applying 
this optimisation. As long as it's clear that the object that provides the 
method is a "str" (although it's potentially None even then!), it's clear 
what the method will do when being called, even if we don't know the 
argument type at compile time. Just provide a fallback, as you did anyway.


I'll comment on your patch below.

diff --git a/Cython/Compiler/Builtin.py b/Cython/Compiler/Builtin.py
+str_startswith_utility_code = UtilityCode(
+proto="""


"proto" is for the forward declaration of the function signature. Except 
for very simple cases (short macros) the implementation goes into "impl".



+static PyObject* _Pyx_StrStartsWith(PyObject* self, PyObject* arg)
+{
+  PyObject* ret;
+
+  if ( PyBytes_CheckExact(self) && PyBytes_CheckExact(arg) ) {
+if ( PyBytes_GET_SIZE(self) < PyBytes_GET_SIZE(arg) )
+  ret = Py_False;
+else if ( memcmp(PyBytes_AS_STRING(self), PyBytes_AS_STRING(arg),
+ PyBytes_GET_SIZE(arg)) == 0 )
+  ret = Py_True;
+else
+  ret = Py_False;
+Py_INCREF(ret);
+return ret;
+  }
+
+  if ( PyUnicode_CheckExact(self) && PyUnicode_CheckExact(arg) ) {
+if ( PyUnicode_GET_SIZE(self) < PyUnicode_GET_SIZE(arg) )
+  ret = Py_False;
+else if ( memcmp(PyUnicode_AS_UNICODE(self), PyUnicode_AS_UNICODE(arg),
+ PyUnicode_GET_DATA_SIZE(arg)) == 0 )
+  ret = Py_True;
+else
+  ret = Py_False;
+Py_INCREF(ret);
+return ret;
+  }
+
+  return PyObject_CallMethod(self, "startswith", "O", arg);


I'd split this up into two functions that can be applied to different types 
independently, and wrap it in a function that calls either one depending on 
PY_MAJOR_VERSION. After all, "str" is *known* to be "bytes" in Py2 and 
"unicode" in Py3.



@@ -522,8 +557,12 @@ builtin_types_table = [
 ]),

 ("bytes",   "PyBytes_Type",[]),
-("str", "PyString_Type",   []),
+("str", "PyString_Type",   [BuiltinMethod("startswith", "TO", "O", 
"_Pyx_StrStartsWith",
+ 
utility_code=str_startswith_utility_code),

+]),


It's better to return a "bint" to avoid returning Python objects for 
True/False. The type character for that is "b". There's a 
"__Pyx_PyObject_IsTrue(obj)" macro that you can use to convert the fallback 
return value. It's usually also ok to copy error handling code from 
CPython, and to raise the exception directly without going into the 
fallback case.



 ("unicode", "PyUnicode_Type",  [BuiltinMethod("join",  "TO",   "T", 
"PyUnicode_Join"),
+BuiltinMethod("startswith", "TO", "O", 
"_Pyx_StrStartsWith",
+ 
utility_code=str_startswith_utility_code),

 ]),


This is already optimised in Cython/Compiler/Optimize.py ("tailmatch"), 
basically because it allows different numbers of arguments that need to be 
dealt with. It may be worth going the same route for "str". The decision 
usually depends on how complex the code transformation is. The method table 
in Builtins is clearly limited.




diff --git a/tests/run/stropts.p

[Cython] Scoped expressions and generators problem

2011-05-19 Thread Vitja Makarov
I don't know how to handle scopedexprs in generators, here are some examples:

[(yield i) for i in l] or [i for i in yield]

In Python3 language mode scoped expression is created for list
comprehension and loop variable is moved there.
So now it isn't stored inside generator closure and is lost between yields.

Btw there is one more problem I hope that's easy to solve:
yield expression inside scoped expression is counted twice.

-- 
vitja.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Specializing str methods?

2011-05-19 Thread John Ehresman

On 5/19/11 3:35 PM, Stefan Behnel wrote:

This is already optimised in Cython/Compiler/Optimize.py ("tailmatch"),
basically because it allows different numbers of arguments that need to
be dealt with. It may be worth going the same route for "str". The
decision usually depends on how complex the code transformation is. The
method table in Builtins is clearly limited.


I'll look at doing this in Optimize.py.  The hunk of my patch below 
fixes a generic bug, though.  Without it a utility function for a method 
is used, but the definition of it is never injected into the C file.


Thanks,

John

diff --git a/Cython/Compiler/ExprNodes.py b/Cython/Compiler/ExprNodes.py
index 6ab8be0..ac7ca67 100755
--- a/Cython/Compiler/ExprNodes.py
+++ b/Cython/Compiler/ExprNodes.py
@@ -3746,6 +3746,8 @@ class AttributeNode(ExprNode):
 if entry.is_variable or entry.is_cmethod:
 self.type = entry.type
 self.member = entry.cname
+if entry.utility_code:
+env.use_utility_code(entry.utility_code)
 return
 else:
 # If it's not a variable or C method, it must be a 
Python

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel