Hi,

the right place to discuss this is the cython-devel mailing list.

John Ehresman, 19.05.2011 18:17:
On 4/13/11 11:23 AM, Stefan Behnel wrote:
You can add BuiltinMethod entries for the "str" type in Builtin.py and
map them to a function name like "__Pyx_Str_PyString_WhatEver()". Then,
define "utility_code" code blocks for each of the methods that #defines
that name to the appropriate PyString_*() or PyUnicode_*() C-API
function, depending on the Python version it runs with. There are a lot
of examples for that in Builtin.py already.

I've finally got back to this and have the start of an implementation. I've
attached a diff, though I'm not sure what the preferred workflow is. Should
I push to github and / or create a bug in trac?

It's best to create a pull request on github, potentially accompanied by a request for review on the developer mailing list. We get a notification about pull requests, but it's still better to give us an idea in what state you think your patch is and what kind of review you need. The review can then happen by commenting on the changes directly in github.


I'll probably add more
methods, but wanted to get feedback before doing so.

Certainly the right decision.


I also intend to contribute other patches in the future.

Please do. :)


One thing that I didn't expect is that I originally declared the method as
BuiltinMethod("startswith", "TT", "O", "_Pyx_StrStartsWith",
utility_code=str_startswith_utility_code),
and expected the specialization to only be used when both self and arg
where a str, but instead cython always used the specialization and added a
check for arg being a str before the specialization was called. I ended up
writing a function that works for both unicode & bytes self and an object arg.

You don't really have to care about the type of the argument when applying this optimisation. As long as it's clear that the object that provides the method is a "str" (although it's potentially None even then!), it's clear what the method will do when being called, even if we don't know the argument type at compile time. Just provide a fallback, as you did anyway.

I'll comment on your patch below.

diff --git a/Cython/Compiler/Builtin.py b/Cython/Compiler/Builtin.py
+str_startswith_utility_code = UtilityCode(
+proto="""


"proto" is for the forward declaration of the function signature. Except for very simple cases (short macros) the implementation goes into "impl".


+static PyObject* _Pyx_StrStartsWith(PyObject* self, PyObject* arg)
+{
+  PyObject* ret;
+
+  if ( PyBytes_CheckExact(self) && PyBytes_CheckExact(arg) ) {
+    if ( PyBytes_GET_SIZE(self) < PyBytes_GET_SIZE(arg) )
+      ret = Py_False;
+    else if ( memcmp(PyBytes_AS_STRING(self), PyBytes_AS_STRING(arg),
+                     PyBytes_GET_SIZE(arg)) == 0 )
+      ret = Py_True;
+    else
+      ret = Py_False;
+    Py_INCREF(ret);
+    return ret;
+  }
+
+  if ( PyUnicode_CheckExact(self) && PyUnicode_CheckExact(arg) ) {
+    if ( PyUnicode_GET_SIZE(self) < PyUnicode_GET_SIZE(arg) )
+      ret = Py_False;
+    else if ( memcmp(PyUnicode_AS_UNICODE(self), PyUnicode_AS_UNICODE(arg),
+                     PyUnicode_GET_DATA_SIZE(arg)) == 0 )
+      ret = Py_True;
+    else
+      ret = Py_False;
+    Py_INCREF(ret);
+    return ret;
+  }
+
+  return PyObject_CallMethod(self, "startswith", "O", arg);


I'd split this up into two functions that can be applied to different types independently, and wrap it in a function that calls either one depending on PY_MAJOR_VERSION. After all, "str" is *known* to be "bytes" in Py2 and "unicode" in Py3.


@@ -522,8 +557,12 @@ builtin_types_table = [
                                     ]),

     ("bytes",   "PyBytes_Type",    []),
-    ("str",     "PyString_Type",   []),
+ ("str", "PyString_Type", [BuiltinMethod("startswith", "TO", "O", "_Pyx_StrStartsWith", + utility_code=str_startswith_utility_code),
+                                    ]),


It's better to return a "bint" to avoid returning Python objects for True/False. The type character for that is "b". There's a "__Pyx_PyObject_IsTrue(obj)" macro that you can use to convert the fallback return value. It's usually also ok to copy error handling code from CPython, and to raise the exception directly without going into the fallback case.


("unicode", "PyUnicode_Type", [BuiltinMethod("join", "TO", "T", "PyUnicode_Join"), + BuiltinMethod("startswith", "TO", "O", "_Pyx_StrStartsWith", + utility_code=str_startswith_utility_code),
                                     ]),


This is already optimised in Cython/Compiler/Optimize.py ("tailmatch"), basically because it allows different numbers of arguments that need to be dealt with. It may be worth going the same route for "str". The decision usually depends on how complex the code transformation is. The method table in Builtins is clearly limited.



diff --git a/tests/run/stropts.pyx b/tests/run/stropts.pyx
@@ -0,0 +1,84 @@
+cimport cython
+
+@cython.test_assert_path_exists(
+    '//AttributeNode[@member="_Pyx_StrStartsWith"]')
+def str_startswith_str():
+    """
+    >>> str_startswith_str()
+    True
+    """
+
+    cdef str a
+    cdef str b
+
+    a = ''
+    b = ''
+    return a.startswith(b)
[...]

I usually just implement one or a few test functions and use the doctest to pass in different values.

Stefan
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to