Re: [Cython] CEP1000: Native dispatch through callables
Nathaniel Smith wrote: >On Fri, Apr 13, 2012 at 11:22 PM, Dag Sverre Seljebotn > wrote: >> >> >> Robert Bradshaw wrote: >> >>>On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith >wrote: On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn wrote: > Ah, I didn't think about 6-bit or huffman. Certainly helps. > > I'm almost +1 on your proposal now, but a couple of more ideas: > > 1) Let the key (the size_t) spill over to the next specialization >>>entry if > it is too large; and prepend that key with a continuation code >(two >>>size-ts > could together say "iii)-d\0\0" on 32 bit systems with 8bit >>>encoding, using > - as continuation). The key-based caller will expect a >continuation >>>if it > knows about the specialization, and the prepended char will >prevent >>>spurios > matches against the overspilled slot. > > We could even use the pointers for part of the continuation... I am really lost here. Why is any of this complicated encoding >stuff better than interning? Interning takes one line of code, is >>>incredibly cheap (one dict lookup per call site and function definition), and >it lets you check any possible signature (even complicated ones >>>involving memoryviews) by doing a single-word comparison. And best of all, >you don't have to think hard to make sure you got the encoding right. >;-) On a 32-bit system, pointers are smaller than a size_t, but more expressive! You can still do binary search if you want, etc. Is the problem just that interning requires a runtime calculation? Because >I feel like C users (like numpy) will want to compute these >compressed codes at module-init anyway, and those of us with a fancy compiler capable of computing them ahead of time (like Cython) can instruct that fancy compiler to compute them at module-init time just as easily? >>> >>>Good question. >>> >>>The primary disadvantage of interning that I see is memory locality. >I >>>suppose if all the C-level caches of interned values were co-located, >>>this may not be as big of an issue. Not being able to compare against >>>compile-time constants may thwart some optimization opportunities, >but >>>that's less clear. > >I would like to see some demonstration of this. E.g., you can run this: > >echo -e '#include \nint main(int argc, char ** argv) { >return strcmp(argv[0], "a"); }' | gcc -S -x c - -o - -O2 | less > >Looks to me like for a short, known-at-compile-time string, with >optimization on, gcc implements it by basically sticking the string in >a global variable and then using a pointer... (If I do argv[0] == >(char *)0x1234, then it places the constant value directly into the >instruction stream. Strangely enough, it does *not* inline the >constant value even if I do memcmp(&argv[0], "\1\2\3\4", 4), which >should be exactly equivalent...!) Right. So: - With keys you have the *option* of hardcoding them, and then they will be in the instruction stream (rather than the instruction stream containing, essentially, a pointer to the key). - With interned, you always have a pointer you must dereference in the instruction stream. > >I think gcc is just as likely to stick a bunch of > static void * interned_dd_to_d; > static void * interned_ll_to_l; >next to each other in the memory image as it is to stick a bunch of >equivalent manifest constants. If you're worried, make it static void >* interned_signatures[NUM_SIGNATURES] -- then they'll definitely be >next to each other. > >>>It also requires coordination common repository, but I suppose one >>>would just stick a set in some standard module (or leverage Python's >>>interning). >> >> More problems: >> >> 1) It doesn't work well with multiple interpreter states. Ok, nothing >works with that at the moment, but it is on the roadmap for Python and >we should not make it worse. > >This isn't a criticism, but I'd like to see a reference to the work in >this direction! My impression was that it's been on the roadmap for >maybe a decade, in a really desultory fashion: >http://docs.python.org/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock >So if it's actually happening that's quite interesting. I wasn't referring to the GIL, but multiple interpreters (where objects from one cannot be used in another). PEP3121 mentions it as one of the things it prepares for. Perhaps that didn't go anywhere, I don't really know. > >> You basically *need* a thread safe store separate from any python >interpreter; though pythread.h does not rely on the interpreter state; >which helps. > >Anyway, yes, if you can't rely on the interpreter than you'd need some >place to store the intern table, but I'm not sure why this would be a >problem (in Python 3.6 or whenever it becomes relevant). > >> 2) you end up with the known comparison values in read-write memory >segments rather than readonly segments, which is probably worse o
Re: [Cython] CEP1000: Native dispatch through callables
Greg Ewing wrote: >Dag Sverre Seljebotn wrote: > >> 1) It doesn't work well with multiple interpreter states. Ok, nothing >works >> with that at the moment, but it is on the roadmap for Python > >Is it really? I got the impression that it's not considered feasible, >since it would require massive changes to the entire implementation >and totally break the existing C API. Has someone thought of a way >around those problems? I was just referring to the offhand comments in PEP3121, but I guess that PEP had multiple reasons, and perhaps this particular arguøent had no significance... You know this a lot better than me. Dag > >-- >Greg >___ >cython-devel mailing list >cython-devel@python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
Dag Sverre Seljebotn, 14.04.2012 10:41: > Greg Ewing wrote: >> Dag Sverre Seljebotn wrote: >> >>> 1) It doesn't work well with multiple interpreter states. Ok, nothing >>> works with that at the moment, but it is on the roadmap for Python >> >> Is it really? I got the impression that it's not considered feasible, >> since it would require massive changes to the entire implementation >> and totally break the existing C API. Has someone thought of a way >> around those problems? > > I was just referring to the offhand comments in PEP3121, but I guess that PEP > had multiple reasons, and perhaps this particular arguøent had no > significance... IIRC, the last status was that even after this PEP, Py3 still has serious issues with keeping extension modules in separate interpreters. And this probably isn't worth doing anything about because it won't work without a major effort in all sorts of places. And I never heard that any extension module even tried to support this. I don't think we should invest too much thought into this direction. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Cython 0.16 RC 1
2012-04-12 16:38:37 mark florisson napisał(a): > Yet another release candidate, this will hopefully be the last before > the 0.16 release. You can grab it from here: > http://wiki.cython.org/ReleaseNotes-0.16 > > There were several fixes for the numpy attribute rewrite, memoryviews > and fused types. Accessing the 'base' attribute of a typed ndarray now > goes through the object layer, which means direct assignment is no > longer supported. > > If there are any problems, please let us know. 4 tests still fail with Python 3.2 (currently 3.2.3). All tests pass with Python 2.6.8, 2.7.3 and 3.1.5. Failures with Python 3.2: == FAIL: NestedWith (withstat) Doctest: withstat.NestedWith -- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line unknown line number, in NestedWith -- File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line ?, in withstat.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.c:5574) File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: invalid start byte == FAIL: NestedWith (withstat) Doctest: withstat.NestedWith -- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/cpp/withstat.cpython-32.so", line unknown line number, in NestedWith -- File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/cpp/withstat.cpython-32.so", line ?, in withstat.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.cpp:5574) File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:8101) File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:7989) File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:7838) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warn
Re: [Cython] Cython 0.16 RC 1
On 12 April 2012 22:00, Wes McKinney wrote: > On Thu, Apr 12, 2012 at 10:38 AM, mark florisson > wrote: >> Yet another release candidate, this will hopefully be the last before >> the 0.16 release. You can grab it from here: >> http://wiki.cython.org/ReleaseNotes-0.16 >> >> There were several fixes for the numpy attribute rewrite, memoryviews >> and fused types. Accessing the 'base' attribute of a typed ndarray now >> goes through the object layer, which means direct assignment is no >> longer supported. >> >> If there are any problems, please let us know. >> ___ >> cython-devel mailing list >> cython-devel@python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > I'm unable to build pandas using git master Cython. I just released > pandas 0.7.3 today which has no issues at all with 0.15.1: > > http://pypi.python.org/pypi/pandas > > For example: > > 16:57 ~/code/pandas (master)$ python setup.py build_ext --inplace > running build_ext > cythoning pandas/src/tseries.pyx to pandas/src/tseries.c > > Error compiling Cython file: > > ... > self.store = {} > > ptr = malloc(self.depth * sizeof(int32_t*)) > > for i in range(self.depth): > ptr[i] = ( label_arrays[i]).data > ^ > > > pandas/src/tseries.pyx:107:59: Compiler crash in AnalyseExpressionsTransform > > ModuleNode.body = StatListNode(tseries.pyx:1:0) > StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) > StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, > as_name = u'MultiMap', > class_name = u'MultiMap', > doc = u'\n Need to come up with a better data structure for > multi-level indexing\n ', > module_name = u'', > visibility = u'private') > CClassDefNode.body = StatListNode(tseries.pyx:91:4) > StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) > StatListNode.stats[0] = DefNode(tseries.pyx:95:4, > modifiers = [...]/0, > name = u'__init__', > num_required_args = 2, > py_wrapper_required = True, > reqd_kw_flags_cname = '0', > used = True) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:96:8) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:106:8) > File 'Nodes.py', line 5903, in analyse_expressions: > ForInStatNode(tseries.pyx:106:8) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:107:21) > File 'Nodes.py', line 4767, in analyse_expressions: > SingleAssignmentNode(tseries.pyx:107:21) > File 'Nodes.py', line 4872, in analyse_types: > SingleAssignmentNode(tseries.pyx:107:21) > File 'ExprNodes.py', line 7082, in analyse_types: > TypecastNode(tseries.pyx:107:21, > result_is_used = True, > use_managed_ref = True) > File 'ExprNodes.py', line 4274, in analyse_types: > AttributeNode(tseries.pyx:107:59, > attribute = u'data', > initialized_check = True, > is_attribute = 1, > member = u'data', > needs_none_check = True, > op = '->', > result_is_used = True, > use_managed_ref = True) > File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: > AttributeNode(tseries.pyx:107:59, > attribute = u'data', > initialized_check = True, > is_attribute = 1, > member = u'data', > needs_none_check = True, > op = '->', > result_is_used = True, > use_managed_ref = True) > File 'ExprNodes.py', line 4436, in analyse_attribute: > AttributeNode(tseries.pyx:107:59, > attribute = u'data', > initialized_check = True, > is_attribute = 1, > member = u'data', > needs_none_check = True, > op = '->', > result_is_used = True, > use_managed_ref = True) > > Compiler crash traceback from this point on: > File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", > line 4436, in analyse_attribute > replacement_node = numpy_transform_attribute_node(self) > File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", > line 18, in numpy_transform_attribute_node > numpy_pxd_scope = node.obj.entry.type.scope.parent_scope > AttributeError: 'TypecastNode' object has no attribute 'entry' > building 'pandas._tseries' extension > creating build > creating build/temp.linux-x86_64-2.7 > creating build/temp.linux-x86_64-2.7/pandas > creating build/temp.linux-x86_64-2.7/pandas/src > gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC > -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include > -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o > build/temp.linux-x86_64-2.7/pandas/src/tseries.o > pandas/src/tseries.c:1:2: error: #error Do not use this file, it is > the result of a failed Cython compilation. > error: command 'gcc' failed with exit status 1 > > > - > > I kludged this particular line in the pandas/timeseries branch so it > will build on git master Cy
Re: [Cython] Cython 0.16 RC 1
Arfrever Frehtes Taifersar Arahesis, 14.04.2012 12:16: > 4 tests still fail with Python 3.2 (currently 3.2.3). > All tests pass with Python 2.6.8, 2.7.3 and 3.1.5. Thanks for the report. > Failures with Python 3.2: > > == > FAIL: NestedWith (withstat) > Doctest: withstat.NestedWith > -- > Traceback (most recent call last): > File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest > raise self.failureException(self.format_failure(new.getvalue())) > AssertionError: Failed doctest test for withstat.NestedWith > File > "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", > line unknown line number, in NestedWith > > -- > File > "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", > line ?, in withstat.NestedWith > Failed example: > NestedWith().runTest() > Exception raised: > Traceback (most recent call last): > File "/usr/lib64/python3.2/doctest.py", line 1288, in __run > compileflags, 1), test.globs) > File "", line 1, in > NestedWith().runTest() > File "withstat.pyx", line 183, in withstat.NestedWith.runTest > (withstat.c:5574) > File "withstat.pyx", line 222, in > withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) > File "withstat.pyx", line 223, in > withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) > File "withstat.pyx", line 224, in > withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) > File "/usr/lib64/python3.2/unittest/case.py", line 1169, in > deprecated_func > DeprecationWarning, 2) > File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning > file.write(formatwarning(message, category, filename, lineno, line)) > File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning > line = linecache.getline(filename, lineno) if line is None else line > File "/usr/lib64/python3.2/linecache.py", line 15, in getline > lines = getlines(filename, module_globals) > File "/usr/lib64/python3.2/doctest.py", line 1372, in > __patched_linecache_getlines > return self.save_linecache_getlines(filename, module_globals) > File "/usr/lib64/python3.2/linecache.py", line 41, in getlines > return updatecache(filename, module_globals) > File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache > lines = fp.readlines() > File "/usr/lib64/python3.2/codecs.py", line 300, in decode > (result, consumed) = self._buffer_decode(data, self.errors, final) > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: > invalid start byte This looks like it's trying to print a DeprecationWarning because of some unittest related problem and fails to format the message for it. Doesn't look Cython related, but I'll see if I can find out something about this. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Cython 0.16 RC 1
On 14 April 2012 12:00, Stefan Behnel wrote: > Arfrever Frehtes Taifersar Arahesis, 14.04.2012 12:16: >> 4 tests still fail with Python 3.2 (currently 3.2.3). >> All tests pass with Python 2.6.8, 2.7.3 and 3.1.5. > > Thanks for the report. > Indeed, I just pushed a fix here: https://github.com/markflorisson88/cython/tree/release Afrever, could you retry running this tests, i.e. python runtests.py -vv 'run\.withstat' Thanks for the help! >> Failures with Python 3.2: >> >> == >> FAIL: NestedWith (withstat) >> Doctest: withstat.NestedWith >> -- >> Traceback (most recent call last): >> File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest >> raise self.failureException(self.format_failure(new.getvalue())) >> AssertionError: Failed doctest test for withstat.NestedWith >> File >> "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", >> line unknown line number, in NestedWith >> >> -- >> File >> "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", >> line ?, in withstat.NestedWith >> Failed example: >> NestedWith().runTest() >> Exception raised: >> Traceback (most recent call last): >> File "/usr/lib64/python3.2/doctest.py", line 1288, in __run >> compileflags, 1), test.globs) >> File "", line 1, in >> NestedWith().runTest() >> File "withstat.pyx", line 183, in withstat.NestedWith.runTest >> (withstat.c:5574) >> File "withstat.pyx", line 222, in >> withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) >> File "withstat.pyx", line 223, in >> withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) >> File "withstat.pyx", line 224, in >> withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) >> File "/usr/lib64/python3.2/unittest/case.py", line 1169, in >> deprecated_func >> DeprecationWarning, 2) >> File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning >> file.write(formatwarning(message, category, filename, lineno, line)) >> File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning >> line = linecache.getline(filename, lineno) if line is None else line >> File "/usr/lib64/python3.2/linecache.py", line 15, in getline >> lines = getlines(filename, module_globals) >> File "/usr/lib64/python3.2/doctest.py", line 1372, in >> __patched_linecache_getlines >> return self.save_linecache_getlines(filename, module_globals) >> File "/usr/lib64/python3.2/linecache.py", line 41, in getlines >> return updatecache(filename, module_globals) >> File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache >> lines = fp.readlines() >> File "/usr/lib64/python3.2/codecs.py", line 300, in decode >> (result, consumed) = self._buffer_decode(data, self.errors, final) >> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: >> invalid start byte > > This looks like it's trying to print a DeprecationWarning because of some > unittest related problem and fails to format the message for it. Doesn't > look Cython related, but I'll see if I can find out something about this. > > Stefan > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
[Cython] Fwd: sage.math cluster OFF until about 9:30am.
Original-Message From: William Stein (wstein a gmail.com) Hi, As previously announced a few times, the sage.math cluster is OFF, due to a electrical work that is being done in the building that houses the server room. I expect the machines to be off until about 9:30am. Obviously, anything that runs on those machines -- including http://sagenb.org, http://sagemath.org, etc. -- is off. I don't expect major havoc in getting things back up, since I had a chance to properly shut down all the machines. -- William ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Cython 0.16 RC 1
On 04/14/2012 12:46 PM, mark florisson wrote: On 12 April 2012 22:00, Wes McKinney wrote: On Thu, Apr 12, 2012 at 10:38 AM, mark florisson wrote: Yet another release candidate, this will hopefully be the last before the 0.16 release. You can grab it from here: http://wiki.cython.org/ReleaseNotes-0.16 There were several fixes for the numpy attribute rewrite, memoryviews and fused types. Accessing the 'base' attribute of a typed ndarray now goes through the object layer, which means direct assignment is no longer supported. If there are any problems, please let us know. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel I'm unable to build pandas using git master Cython. I just released pandas 0.7.3 today which has no issues at all with 0.15.1: http://pypi.python.org/pypi/pandas For example: 16:57 ~/code/pandas (master)$ python setup.py build_ext --inplace running build_ext cythoning pandas/src/tseries.pyx to pandas/src/tseries.c Error compiling Cython file: ... self.store = {} ptr = malloc(self.depth * sizeof(int32_t*)) for i in range(self.depth): ptr[i] = ( label_arrays[i]).data ^ pandas/src/tseries.pyx:107:59: Compiler crash in AnalyseExpressionsTransform ModuleNode.body = StatListNode(tseries.pyx:1:0) StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, as_name = u'MultiMap', class_name = u'MultiMap', doc = u'\nNeed to come up with a better data structure for multi-level indexing\n', module_name = u'', visibility = u'private') CClassDefNode.body = StatListNode(tseries.pyx:91:4) StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) StatListNode.stats[0] = DefNode(tseries.pyx:95:4, modifiers = [...]/0, name = u'__init__', num_required_args = 2, py_wrapper_required = True, reqd_kw_flags_cname = '0', used = True) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:96:8) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:106:8) File 'Nodes.py', line 5903, in analyse_expressions: ForInStatNode(tseries.pyx:106:8) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:107:21) File 'Nodes.py', line 4767, in analyse_expressions: SingleAssignmentNode(tseries.pyx:107:21) File 'Nodes.py', line 4872, in analyse_types: SingleAssignmentNode(tseries.pyx:107:21) File 'ExprNodes.py', line 7082, in analyse_types: TypecastNode(tseries.pyx:107:21, result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4274, in analyse_types: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4436, in analyse_attribute: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) Compiler crash traceback from this point on: File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", line 4436, in analyse_attribute replacement_node = numpy_transform_attribute_node(self) File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", line 18, in numpy_transform_attribute_node numpy_pxd_scope = node.obj.entry.type.scope.parent_scope AttributeError: 'TypecastNode' object has no attribute 'entry' building 'pandas._tseries' extension creating build creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/pandas creating build/temp.linux-x86_64-2.7/pandas/src gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o build/temp.linux-x86_64-2.7/pandas/src/tseries.o pandas/src/tseries.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation. error: command 'gcc' failed with exit status 1 - I kludged this particular line in the pandas/timeseries branch so it will build on git master Cython, but I was treated to dozens of failures, errors, and finally a segfault in the middle of the test suite. Suffice to say I'm not sure I would advise you to relea
Re: [Cython] Cython 0.16 RC 1
mark florisson, 14.04.2012 13:02: > I just pushed a fix here: > https://github.com/markflorisson88/cython/tree/release Note that I had already pushed a couple of other fixes into the release branch of the main repo. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Cython 0.16 RC 1
On 14 April 2012 14:57, Dag Sverre Seljebotn wrote: > On 04/14/2012 12:46 PM, mark florisson wrote: >> >> On 12 April 2012 22:00, Wes McKinney wrote: >>> >>> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >>> wrote: Yet another release candidate, this will hopefully be the last before the 0.16 release. You can grab it from here: http://wiki.cython.org/ReleaseNotes-0.16 There were several fixes for the numpy attribute rewrite, memoryviews and fused types. Accessing the 'base' attribute of a typed ndarray now goes through the object layer, which means direct assignment is no longer supported. If there are any problems, please let us know. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >>> I'm unable to build pandas using git master Cython. I just released >>> pandas 0.7.3 today which has no issues at all with 0.15.1: >>> >>> http://pypi.python.org/pypi/pandas >>> >>> For example: >>> >>> 16:57 ~/code/pandas (master)$ python setup.py build_ext --inplace >>> running build_ext >>> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >>> >>> Error compiling Cython file: >>> >>> ... >>> self.store = {} >>> >>> ptr = malloc(self.depth * sizeof(int32_t*)) >>> >>> for i in range(self.depth): >>> ptr[i] = ( label_arrays[i]).data >>> ^ >>> >>> >>> pandas/src/tseries.pyx:107:59: Compiler crash in >>> AnalyseExpressionsTransform >>> >>> ModuleNode.body = StatListNode(tseries.pyx:1:0) >>> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >>> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >>> as_name = u'MultiMap', >>> class_name = u'MultiMap', >>> doc = u'\n Need to come up with a better data structure for >>> multi-level indexing\n ', >>> module_name = u'', >>> visibility = u'private') >>> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >>> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >>> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >>> modifiers = [...]/0, >>> name = u'__init__', >>> num_required_args = 2, >>> py_wrapper_required = True, >>> reqd_kw_flags_cname = '0', >>> used = True) >>> File 'Nodes.py', line 342, in analyse_expressions: >>> StatListNode(tseries.pyx:96:8) >>> File 'Nodes.py', line 342, in analyse_expressions: >>> StatListNode(tseries.pyx:106:8) >>> File 'Nodes.py', line 5903, in analyse_expressions: >>> ForInStatNode(tseries.pyx:106:8) >>> File 'Nodes.py', line 342, in analyse_expressions: >>> StatListNode(tseries.pyx:107:21) >>> File 'Nodes.py', line 4767, in analyse_expressions: >>> SingleAssignmentNode(tseries.pyx:107:21) >>> File 'Nodes.py', line 4872, in analyse_types: >>> SingleAssignmentNode(tseries.pyx:107:21) >>> File 'ExprNodes.py', line 7082, in analyse_types: >>> TypecastNode(tseries.pyx:107:21, >>> result_is_used = True, >>> use_managed_ref = True) >>> File 'ExprNodes.py', line 4274, in analyse_types: >>> AttributeNode(tseries.pyx:107:59, >>> attribute = u'data', >>> initialized_check = True, >>> is_attribute = 1, >>> member = u'data', >>> needs_none_check = True, >>> op = '->', >>> result_is_used = True, >>> use_managed_ref = True) >>> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >>> AttributeNode(tseries.pyx:107:59, >>> attribute = u'data', >>> initialized_check = True, >>> is_attribute = 1, >>> member = u'data', >>> needs_none_check = True, >>> op = '->', >>> result_is_used = True, >>> use_managed_ref = True) >>> File 'ExprNodes.py', line 4436, in analyse_attribute: >>> AttributeNode(tseries.pyx:107:59, >>> attribute = u'data', >>> initialized_check = True, >>> is_attribute = 1, >>> member = u'data', >>> needs_none_check = True, >>> op = '->', >>> result_is_used = True, >>> use_managed_ref = True) >>> >>> Compiler crash traceback from this point on: >>> File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >>> line 4436, in analyse_attribute >>> replacement_node = numpy_transform_attribute_node(self) >>> File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >>> line 18, in numpy_transform_attribute_node >>> numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >>> AttributeError: 'TypecastNode' object has no attribute 'entry' >>> building 'pandas._tseries' extension >>> creating build >>> creating build/temp.linux-x86_64-2.7 >>> creating build/temp.linux-x86_64-2.7/pandas >>> creating build/temp.linux-x86_64-2.7/pandas/src >>> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >>> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >>> -I/home/wesm/epd/in
Re: [Cython] CEP1000: Native dispatch through callables
On Sat, Apr 14, 2012 at 1:56 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 14.04.2012 10:41: >> Greg Ewing wrote: >>> Dag Sverre Seljebotn wrote: >>> 1) It doesn't work well with multiple interpreter states. Ok, nothing works with that at the moment, but it is on the roadmap for Python >>> >>> Is it really? I got the impression that it's not considered feasible, >>> since it would require massive changes to the entire implementation >>> and totally break the existing C API. Has someone thought of a way >>> around those problems? >> >> I was just referring to the offhand comments in PEP3121, but I guess that >> PEP had multiple reasons, and perhaps this particular arguøent had no >> significance... > > IIRC, the last status was that even after this PEP, Py3 still has serious > issues with keeping extension modules in separate interpreters. And this > probably isn't worth doing anything about because it won't work without a > major effort in all sorts of places. And I never heard that any extension > module even tried to support this. > > I don't think we should invest too much thought into this direction. I had never even heard of this PEP before this thread, but this certainly seems reasonable to me. Aside from this, there is some value with the inlined signature in that a pure C library can easily support the ABI as well. Has anyone done any experiments/timings to see if having constants vs. globals even matters? - Robert ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
On 04/14/2012 08:10 PM, Robert Bradshaw wrote: On Sat, Apr 14, 2012 at 1:56 AM, Stefan Behnel wrote: Dag Sverre Seljebotn, 14.04.2012 10:41: Greg Ewing wrote: Dag Sverre Seljebotn wrote: 1) It doesn't work well with multiple interpreter states. Ok, nothing works with that at the moment, but it is on the roadmap for Python Is it really? I got the impression that it's not considered feasible, since it would require massive changes to the entire implementation and totally break the existing C API. Has someone thought of a way around those problems? I was just referring to the offhand comments in PEP3121, but I guess that PEP had multiple reasons, and perhaps this particular arguøent had no significance... IIRC, the last status was that even after this PEP, Py3 still has serious issues with keeping extension modules in separate interpreters. And this probably isn't worth doing anything about because it won't work without a major effort in all sorts of places. And I never heard that any extension module even tried to support this. I don't think we should invest too much thought into this direction. A shame; short of getting rid of the GIL, multiple interpreter states would be my favourite shared-memory parallel computation approach, as they could share NumPy buffers (and other C-level structures), without worrying about allocating in process-shared memory (and most data structures, like std::map, won't even work (or, work portably and reliably) in process-shared memory anyway). Multiple seperate interpreter states would be a very nice way of getting the benefits of multi-threading without the disadvantages. I had never even heard of this PEP before this thread, but this certainly seems reasonable to me. Aside from this, there is some value with the inlined signature in that a pure C library can easily support the ABI as well. Yes -- I think both "sides" of this discussion prefer their approach out of aesthetics more than performance :-) I'll post a revamped CEP in a minute to at least try to sum them up. Has anyone done any experiments/timings to see if having constants vs. globals even matters? It'd be interesting to see; won't have time myself until Monday earliest. Dag ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: Travis Oliphant recently raised the issue on the NumPy list of what mechanisms to use to box native functions produced by his Numba so that SciPy functions can call it, e.g. (I'm making the numba part up): This thread is turning into one of those big ones... But I think it is really worth it in the end; I'm getting excited about the possibility down the road of importing functions using normal Python mechanisms and still have fast calls. Anyway, to organize discussion I've tried to revamp the CEP and describe both the intern-way and the strcmp-way. The wiki appears to be down, so I'll post it below... Dag = CEP 1000: Convention for native dispatches through Python callables = Many callable objects are simply wrappers around native code. This holds for any Cython function, f2py functions, manually written CPython extensions, Numba, etc. Obviously, when native code calls other native code, it would be nice to skip the significant cost of boxing and unboxing all the arguments. Early binding at compile-time is only possible between different Cython modules, not between all the tools listed above. [[enhancements/nativecall|CEP 523]] deals with Cython-specific aspects (and is out-of-date w.r.t. this CEP); this CEP is intended to be about a cross-project convention only. If a success, this CEP may be proposesd as a PEP in a modified form. Motivating example (looking a year or two into the future): {{{ @numba def f(x): return 2 * x @cython.inline def g(x : cython.double): return 3 * x from fortranmod import h print f(3) print g(3) print h(3) print scipy.integrate.quad(f, 0.2, 3) # fast callback! print scipy.integrate.quad(g, 0.2, 3) # fast callback! print scipy.integrate.quad(h, 0.2, 3) # fast callback! }}} == The native-call slot == We need ''fast'' access to probing whether a callable object supports this CEP. Other mechanisms, such as an attribute in a dict, is too slow for many purposes (quoting robertwb: "We're trying to get a 300ns dispatch down to 10ns; you do not want a 50ns dict lookup"). (Obviously, if you call a callable in a loop you can fetch the pointer outside of the loop. But in particular if this becomes a language feature in Cython it will be used in all sorts of places.) So we hack another type slot into existing and future CPython implementations in the following way: This CEP provides a C header that for all Python versions define a macro {{{Py_TPFLAGS_UNOFFICIAL_EXTRAS}}} for a free bit in {{{tp_flags}}} in the {{{PyTypeObject}}}. If present, then we extend {{{PyTypeObject}}} as follows: {{{ typedef struct { PyTypeObject tp_main; size_t tp_unofficial_flags; size_t tp_nativecall_offset; } PyUnofficialTypeObject; }}} {{{tp_unofficial_flags}}} is unused and should be all 0 for the time being, but can be used later to indicate features beyond this CEP. If {{{tp_nativecall_offset != 0}}}, this CEP is supported, and the information for doing a native dispatch on a callable {{{obj}}} is located at {{{ (char*)obj + ((PyUnofficialTypeObject*)obj->ob_type)->tp_nativecall_offset; }}} === GIL-less accesss === It is OK to access the native-call table without holding the GIL. This should of course only be used to call functions that state in their signature that they don't need the GIL. This is important for JITted callables who would like to rewrite their table as more specializations gets added; if one needs to reallocate the table, the old table must linger along long enough that all threads that are currently accessing it are done with it. == Native dispatch descriptor == The final format for the descriptor is not agreed upon yet; this sums up the major alternatives. The descriptor should be a list of specializations/overload, each described by a function pointer and a signature specification string, such as "id)i" for {{{int f(int, double)}}}. The way it is stored must cater for two cases; first, when the caller expects one or more hard-coded signatures: {{{ if (obj has signature "id)i") { call; } else if (obj has signature "if)i") { call with promoted second argument; } else { box all arguments; PyObject_Call; } }}} The second is when a call stack is built dynamically while parsing the string. Since this has higher overhead anyway, optimizing for the first case makes sense. === Approach 1: Interning/run-time allocated IDs === 1A: Let each overload have a struct {{{ struct { size_t signature_id; char *signature; void *func_ptr; }; }}} Within each process run, there is a 1:1 between {{{signature}}} and {{{signature_id}}}. {{{signature_id}}} is allocated by some central registry. 1B: Intern the string instead: {{{ struct { char *signature; /* pointer must come from the central registry */ void *func_ptr; }; }}} However this is '''not'' trivial, since signature strings can be allocated on the heap (e.g., a JIT would do this), so interned strings must be memory man
Re: [Cython] CEP1000: Native dispatch through callables
On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: > On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> >> Travis Oliphant recently raised the issue on the NumPy list of what >> mechanisms to use to box native functions produced by his Numba so that >> SciPy functions can call it, e.g. (I'm making the numba part up): > > > > This thread is turning into one of those big ones... > > But I think it is really worth it in the end; I'm getting excited about the > possibility down the road of importing functions using normal Python > mechanisms and still have fast calls. > > Anyway, to organize discussion I've tried to revamp the CEP and describe > both the intern-way and the strcmp-way. > > The wiki appears to be down, so I'll post it below... > > Dag > > = CEP 1000: Convention for native dispatches through Python callables = > > Many callable objects are simply wrappers around native code. This > holds for any Cython function, f2py functions, manually > written CPython extensions, Numba, etc. > > Obviously, when native code calls other native code, it would be > nice to skip the significant cost of boxing and unboxing all the arguments. > Early binding at compile-time is only possible > between different Cython modules, not between all the tools > listed above. > > [[enhancements/nativecall|CEP 523]] deals with Cython-specific aspects > (and is out-of-date w.r.t. this CEP); this CEP is intended to be about > a cross-project convention only. If a success, this CEP may be > proposesd as a PEP in a modified form. > > Motivating example (looking a year or two into the future): > > {{{ > @numba > def f(x): return 2 * x > > @cython.inline > def g(x : cython.double): return 3 * x > > from fortranmod import h > > print f(3) > print g(3) > print h(3) > print scipy.integrate.quad(f, 0.2, 3) # fast callback! > print scipy.integrate.quad(g, 0.2, 3) # fast callback! > print scipy.integrate.quad(h, 0.2, 3) # fast callback! > > }}} > > == The native-call slot == > > We need ''fast'' access to probing whether a callable object supports > this CEP. Other mechanisms, such as an attribute in a dict, is too > slow for many purposes (quoting robertwb: "We're trying to get a 300ns > dispatch down to 10ns; you do not want a 50ns dict lookup"). (Obviously, > if you call a callable in a loop you can fetch the pointer outside > of the loop. But in particular if this becomes a language feature > in Cython it will be used in all sorts of places.) > > So we hack another type slot into existing and future CPython > implementations in the following way: This CEP provides a C header > that for all Python versions define a macro > {{{Py_TPFLAGS_UNOFFICIAL_EXTRAS}}} for a free bit in > {{{tp_flags}}} in the {{{PyTypeObject}}}. > > If present, then we extend {{{PyTypeObject}}} > as follows: > {{{ > typedef struct { > PyTypeObject tp_main; > size_t tp_unofficial_flags; > size_t tp_nativecall_offset; > } PyUnofficialTypeObject; > }}} > > {{{tp_unofficial_flags}}} is unused and should be all 0 for the time > being, but can be used later to indicate features beyond this CEP. > > If {{{tp_nativecall_offset != 0}}}, this CEP is supported, and > the information for doing a native dispatch on a callable {{{obj}}} > is located at > {{{ > (char*)obj + ((PyUnofficialTypeObject*)obj->ob_type)->tp_nativecall_offset; > }}} > > === GIL-less accesss === > > It is OK to access the native-call table without holding the GIL. This > should of course only be used to call functions that state in their > signature that they don't need the GIL. > > This is important for JITted callables who would like to rewrite their > table as more specializations gets added; if one needs to reallocate > the table, the old table must linger along long enough that all > threads that are currently accessing it are done with it. > > == Native dispatch descriptor == > > The final format for the descriptor is not agreed upon yet; this sums > up the major alternatives. > > The descriptor should be a list of specializations/overload, each > described by a function pointer and a signature specification > string, such as "id)i" for {{{int f(int, double)}}}. > > The way it is stored must cater for two cases; first, when the caller > expects one or more hard-coded signatures: > {{{ > if (obj has signature "id)i") { > call; > } else if (obj has signature "if)i") { > call with promoted second argument; > } else { > box all arguments; > PyObject_Call; > } > }}} There may be a lot of promotion/demotion (you likely only want the former) combinations, especially for multiple arguments, so perhaps it makes sense to limit ourselves a bit. For instance for numeric scalar argument types we could limit to long (and the unsigned counterparts), double and double complex. So char, short and int scalars will be promoted to long, float to double and float complex to double complex. Anything bigger, like long long etc will be matched specifically. Promotions and associated demotio
Re: [Cython] CEP1000: Native dispatch through callables
Hi, thanks for writing this up. Comments inline as I read through it. Dag Sverre Seljebotn, 14.04.2012 21:08: > === GIL-less accesss === > > It is OK to access the native-call table without holding the GIL. This > should of course only be used to call functions that state in their > signature that they don't need the GIL. > > This is important for JITted callables who would like to rewrite their > table as more specializations gets added; if one needs to reallocate > the table, the old table must linger along long enough that all > threads that are currently accessing it are done with it. The problem here is that changing the table in the face of threaded access is very likely to introduce race conditions, and the average library out there won't know when all threads are done with it. I don't think later modification is a good idea. > == Native dispatch descriptor == > > The final format for the descriptor is not agreed upon yet; this sums > up the major alternatives. > > The descriptor should be a list of specializations/overload While overloaded signatures are great for the callee, they make things much more complicated for the caller. It's no longer just one signature that either matches or not. Especially when we allow more than one expected signature, then each of them has to be compared against all exported signatures. We'll have to see what the runtime impact and the impact on the code complexity is, I guess. > each described by a function pointer and a signature specification > string, such as "id)i" for {{{int f(int, double)}}}. How do we deal with object argument types? Do we care on the caller side? Functions might have alternative signatures that differ in the type of their object parameters. Or should we handle this inside of the caller and expect that it's something like a fused function with internal dispatch in that case? Personally, I think there is not enough to gain from object parameters that we should handle it on the caller side. The callee can dispatch those if necessary. What about signatures that require an object when we have a C typed value? What about signatures that require a C typed argument when we have an arbitrary object value in our call parameters? We should also strip the "self" argument from the parameter list of methods. That's handled by the attribute lookup before even getting at the callable. > === Approach 1: Interning/run-time allocated IDs === > > > 1A: Let each overload have a struct > {{{ > struct { > size_t signature_id; > char *signature; > void *func_ptr; > }; > }}} > Within each process run, there is a 1:1 mapping/relation > between {{{signature}}} and > {{{signature_id}}}. {{{signature_id}}} is allocated by some central > registry. > > 1B: Intern the string instead: > {{{ > struct { > char *signature; /* pointer must come from the central registry */ > void *func_ptr; > }; > }}} > However this is '''not'' trivial, since signature strings can > be allocated on the heap (e.g., a JIT would do this), so interned strings > must be memory managed and reference counted. Not necessarily, they are really short strings that could just live forever, stored efficiently by the registry in a series of larger memory blocks. It would take a while to fill up enough memory with those to become problematic. Finding an efficiently lookup scheme for them might become interesting at some point, but that would also take a while. I don't expect real-world systems to have to deal with thousands of different runtime(!) discovered signatures during one interpreter lifetime. > Discussion > > '''The cost of comparing a signature''': Comparing a global variable (needle) > to a value that is guaranteed to already be in cache (candidate match) > > '''Pros:''' > > * Conceptually simple struct format. > > '''Cons:''' > > * Requires a registry for interning strings. This must be >"handshaked" between the implementors of this CEP (probably by >"first to get at {{{sys.modules["_nativecall"}}} sticks it there), >as we can't ship a common dependency library for this CEP. ... which would eventually end up in the stdlib, but could equally well come from PyPI for now. I don't see a problem with that. Using sys.modules (or another global store) instead of an explicit import allows for dependency injection, that's good. > === Approach 2: Efficient strcmp of verbatim signatures === > > The idea is to store the full signatures and the function pointers together > in the same memory area, but still have some structure to allow for quick > scanning through the list. > > Each entry has the structure {{{[signature_string, funcptr]}}} > where: > > * The signature string has variable length, but the length is >divisible by 8 bytes on all platforms. The {{{funcptr}}} is always >8 bytes (it is padded on 32-bit systems). > > * The total size of the entry should be divisible by 16 bytes (= the >signature data sho
Re: [Cython] CEP1000: Native dispatch through callables
mark florisson, 14.04.2012 23:00: > On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: >> * TBD: Information about GIL requirements (nogil, with gil?), how >> exceptions are reported > > Maybe that could be a separate list, to be consulted mostly for > explicit casts (I think PyErr_Occurred() would be the default for > non-object return types). Good idea. We could have an additional "flags" field for each signature (or maybe just each callable?) that would contain orthogonal information about exception handling and GIL requirements. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Cython 0.16 RC 1
On Sat, Apr 14, 2012 at 11:32 AM, mark florisson wrote: > On 14 April 2012 14:57, Dag Sverre Seljebotn > wrote: >> On 04/14/2012 12:46 PM, mark florisson wrote: >>> >>> On 12 April 2012 22:00, Wes McKinney wrote: On Thu, Apr 12, 2012 at 10:38 AM, mark florisson wrote: > > Yet another release candidate, this will hopefully be the last before > the 0.16 release. You can grab it from here: > http://wiki.cython.org/ReleaseNotes-0.16 > > There were several fixes for the numpy attribute rewrite, memoryviews > and fused types. Accessing the 'base' attribute of a typed ndarray now > goes through the object layer, which means direct assignment is no > longer supported. > > If there are any problems, please let us know. > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel I'm unable to build pandas using git master Cython. I just released pandas 0.7.3 today which has no issues at all with 0.15.1: http://pypi.python.org/pypi/pandas For example: 16:57 ~/code/pandas (master)$ python setup.py build_ext --inplace running build_ext cythoning pandas/src/tseries.pyx to pandas/src/tseries.c Error compiling Cython file: ... self.store = {} ptr = malloc(self.depth * sizeof(int32_t*)) for i in range(self.depth): ptr[i] = ( label_arrays[i]).data ^ pandas/src/tseries.pyx:107:59: Compiler crash in AnalyseExpressionsTransform ModuleNode.body = StatListNode(tseries.pyx:1:0) StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, as_name = u'MultiMap', class_name = u'MultiMap', doc = u'\n Need to come up with a better data structure for multi-level indexing\n ', module_name = u'', visibility = u'private') CClassDefNode.body = StatListNode(tseries.pyx:91:4) StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) StatListNode.stats[0] = DefNode(tseries.pyx:95:4, modifiers = [...]/0, name = u'__init__', num_required_args = 2, py_wrapper_required = True, reqd_kw_flags_cname = '0', used = True) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:96:8) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:106:8) File 'Nodes.py', line 5903, in analyse_expressions: ForInStatNode(tseries.pyx:106:8) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:107:21) File 'Nodes.py', line 4767, in analyse_expressions: SingleAssignmentNode(tseries.pyx:107:21) File 'Nodes.py', line 4872, in analyse_types: SingleAssignmentNode(tseries.pyx:107:21) File 'ExprNodes.py', line 7082, in analyse_types: TypecastNode(tseries.pyx:107:21, result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4274, in analyse_types: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4436, in analyse_attribute: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) Compiler crash traceback from this point on: File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", line 4436, in analyse_attribute replacement_node = numpy_transform_attribute_node(self) File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", line 18, in numpy_transform_attribute_node numpy_pxd_scope = node.obj.entry.type.scope.parent_scope AttributeError: 'TypecastNode' object has no attribute 'entry' building 'pandas._tseries' extension creating build creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/pandas creating build
Re: [Cython] CEP1000: Native dispatch through callables
On 14 April 2012 22:02, Stefan Behnel wrote: > Hi, > > thanks for writing this up. Comments inline as I read through it. > > Dag Sverre Seljebotn, 14.04.2012 21:08: >> === GIL-less accesss === >> >> It is OK to access the native-call table without holding the GIL. This >> should of course only be used to call functions that state in their >> signature that they don't need the GIL. >> >> This is important for JITted callables who would like to rewrite their >> table as more specializations gets added; if one needs to reallocate >> the table, the old table must linger along long enough that all >> threads that are currently accessing it are done with it. > > The problem here is that changing the table in the face of threaded access > is very likely to introduce race conditions, and the average library out > there won't know when all threads are done with it. I don't think later > modification is a good idea. > > >> == Native dispatch descriptor == >> >> The final format for the descriptor is not agreed upon yet; this sums >> up the major alternatives. >> >> The descriptor should be a list of specializations/overload > > While overloaded signatures are great for the callee, they make things much > more complicated for the caller. It's no longer just one signature that > either matches or not. Especially when we allow more than one expected > signature, then each of them has to be compared against all exported > signatures. > > We'll have to see what the runtime impact and the impact on the code > complexity is, I guess. > > >> each described by a function pointer and a signature specification >> string, such as "id)i" for {{{int f(int, double)}}}. > > How do we deal with object argument types? Do we care on the caller side? > Functions might have alternative signatures that differ in the type of > their object parameters. Or should we handle this inside of the caller and > expect that it's something like a fused function with internal dispatch in > that case? > > Personally, I think there is not enough to gain from object parameters that > we should handle it on the caller side. The callee can dispatch those if > necessary. > > What about signatures that require an object when we have a C typed value? > > What about signatures that require a C typed argument when we have an > arbitrary object value in our call parameters? > > We should also strip the "self" argument from the parameter list of > methods. That's handled by the attribute lookup before even getting at the > callable. > > >> === Approach 1: Interning/run-time allocated IDs === >> >> >> 1A: Let each overload have a struct >> {{{ >> struct { >> size_t signature_id; >> char *signature; >> void *func_ptr; >> }; >> }}} >> Within each process run, there is a 1:1 > > mapping/relation > >> between {{{signature}}} and >> {{{signature_id}}}. {{{signature_id}}} is allocated by some central >> registry. >> >> 1B: Intern the string instead: >> {{{ >> struct { >> char *signature; /* pointer must come from the central registry */ >> void *func_ptr; >> }; >> }}} >> However this is '''not'' trivial, since signature strings can >> be allocated on the heap (e.g., a JIT would do this), so interned strings >> must be memory managed and reference counted. > > Not necessarily, they are really short strings that could just live > forever, stored efficiently by the registry in a series of larger memory > blocks. It would take a while to fill up enough memory with those to become > problematic. Finding an efficiently lookup scheme for them might become > interesting at some point, but that would also take a while. > > I don't expect real-world systems to have to deal with thousands of > different runtime(!) discovered signatures during one interpreter lifetime. > > >> Discussion >> >> '''The cost of comparing a signature''': Comparing a global variable (needle) >> to a value that is guaranteed to already be in cache (candidate match) >> >> '''Pros:''' >> >> * Conceptually simple struct format. >> >> '''Cons:''' >> >> * Requires a registry for interning strings. This must be >> "handshaked" between the implementors of this CEP (probably by >> "first to get at {{{sys.modules["_nativecall"}}} sticks it there), >> as we can't ship a common dependency library for this CEP. > > ... which would eventually end up in the stdlib, but could equally well > come from PyPI for now. I don't see a problem with that. > > Using sys.modules (or another global store) instead of an explicit import > allows for dependency injection, that's good. > > >> === Approach 2: Efficient strcmp of verbatim signatures === >> >> The idea is to store the full signatures and the function pointers together >> in the same memory area, but still have some structure to allow for quick >> scanning through the list. >> >> Each entry has the structure {{{[signature_string, funcptr]}}} >> where: >> >> * The signature string has variable length, but the length is >> divis
Re: [Cython] Cython 0.16 RC 1
On 14 April 2012 22:13, Wes McKinney wrote: > On Sat, Apr 14, 2012 at 11:32 AM, mark florisson > wrote: >> On 14 April 2012 14:57, Dag Sverre Seljebotn >> wrote: >>> On 04/14/2012 12:46 PM, mark florisson wrote: On 12 April 2012 22:00, Wes McKinney wrote: > > On Thu, Apr 12, 2012 at 10:38 AM, mark florisson > wrote: >> >> Yet another release candidate, this will hopefully be the last before >> the 0.16 release. You can grab it from here: >> http://wiki.cython.org/ReleaseNotes-0.16 >> >> There were several fixes for the numpy attribute rewrite, memoryviews >> and fused types. Accessing the 'base' attribute of a typed ndarray now >> goes through the object layer, which means direct assignment is no >> longer supported. >> >> If there are any problems, please let us know. >> ___ >> cython-devel mailing list >> cython-devel@python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > > I'm unable to build pandas using git master Cython. I just released > pandas 0.7.3 today which has no issues at all with 0.15.1: > > http://pypi.python.org/pypi/pandas > > For example: > > 16:57 ~/code/pandas (master)$ python setup.py build_ext --inplace > running build_ext > cythoning pandas/src/tseries.pyx to pandas/src/tseries.c > > Error compiling Cython file: > > ... > self.store = {} > > ptr = malloc(self.depth * sizeof(int32_t*)) > > for i in range(self.depth): > ptr[i] = ( label_arrays[i]).data > ^ > > > pandas/src/tseries.pyx:107:59: Compiler crash in > AnalyseExpressionsTransform > > ModuleNode.body = StatListNode(tseries.pyx:1:0) > StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) > StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, > as_name = u'MultiMap', > class_name = u'MultiMap', > doc = u'\n Need to come up with a better data structure for > multi-level indexing\n ', > module_name = u'', > visibility = u'private') > CClassDefNode.body = StatListNode(tseries.pyx:91:4) > StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) > StatListNode.stats[0] = DefNode(tseries.pyx:95:4, > modifiers = [...]/0, > name = u'__init__', > num_required_args = 2, > py_wrapper_required = True, > reqd_kw_flags_cname = '0', > used = True) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:96:8) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:106:8) > File 'Nodes.py', line 5903, in analyse_expressions: > ForInStatNode(tseries.pyx:106:8) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:107:21) > File 'Nodes.py', line 4767, in analyse_expressions: > SingleAssignmentNode(tseries.pyx:107:21) > File 'Nodes.py', line 4872, in analyse_types: > SingleAssignmentNode(tseries.pyx:107:21) > File 'ExprNodes.py', line 7082, in analyse_types: > TypecastNode(tseries.pyx:107:21, > result_is_used = True, > use_managed_ref = True) > File 'ExprNodes.py', line 4274, in analyse_types: > AttributeNode(tseries.pyx:107:59, > attribute = u'data', > initialized_check = True, > is_attribute = 1, > member = u'data', > needs_none_check = True, > op = '->', > result_is_used = True, > use_managed_ref = True) > File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: > AttributeNode(tseries.pyx:107:59, > attribute = u'data', > initialized_check = True, > is_attribute = 1, > member = u'data', > needs_none_check = True, > op = '->', > result_is_used = True, > use_managed_ref = True) > File 'ExprNodes.py', line 4436, in analyse_attribute: > AttributeNode(tseries.pyx:107:59, > attribute = u'data', > initialized_check = True, > is_attribute = 1, > member = u'data', > needs_none_check = True, > op = '->', > result_is_used = True, > use_managed_ref = True) > > Compiler crash traceback from this point on: > File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", > line 4436, in analyse_attribute > replacement_node = numpy_transform_attribute_node(self) > File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", > line 18, in numpy_transform_attribute_node > numpy_pxd_scope = node.obj.entry.type.scope.parent_scope > AttributeError: 'TypecastNode' object has no attribute 'entry'
Re: [Cython] Cython 0.16 RC 1
On 14 April 2012 22:21, mark florisson wrote: > On 14 April 2012 22:13, Wes McKinney wrote: >> On Sat, Apr 14, 2012 at 11:32 AM, mark florisson >> wrote: >>> On 14 April 2012 14:57, Dag Sverre Seljebotn >>> wrote: On 04/14/2012 12:46 PM, mark florisson wrote: > > On 12 April 2012 22:00, Wes McKinney wrote: >> >> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >> wrote: >>> >>> Yet another release candidate, this will hopefully be the last before >>> the 0.16 release. You can grab it from here: >>> http://wiki.cython.org/ReleaseNotes-0.16 >>> >>> There were several fixes for the numpy attribute rewrite, memoryviews >>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>> goes through the object layer, which means direct assignment is no >>> longer supported. >>> >>> If there are any problems, please let us know. >>> ___ >>> cython-devel mailing list >>> cython-devel@python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> I'm unable to build pandas using git master Cython. I just released >> pandas 0.7.3 today which has no issues at all with 0.15.1: >> >> http://pypi.python.org/pypi/pandas >> >> For example: >> >> 16:57 ~/code/pandas (master)$ python setup.py build_ext --inplace >> running build_ext >> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >> >> Error compiling Cython file: >> >> ... >> self.store = {} >> >> ptr = malloc(self.depth * sizeof(int32_t*)) >> >> for i in range(self.depth): >> ptr[i] = ( label_arrays[i]).data >> ^ >> >> >> pandas/src/tseries.pyx:107:59: Compiler crash in >> AnalyseExpressionsTransform >> >> ModuleNode.body = StatListNode(tseries.pyx:1:0) >> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >> as_name = u'MultiMap', >> class_name = u'MultiMap', >> doc = u'\n Need to come up with a better data structure for >> multi-level indexing\n ', >> module_name = u'', >> visibility = u'private') >> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >> modifiers = [...]/0, >> name = u'__init__', >> num_required_args = 2, >> py_wrapper_required = True, >> reqd_kw_flags_cname = '0', >> used = True) >> File 'Nodes.py', line 342, in analyse_expressions: >> StatListNode(tseries.pyx:96:8) >> File 'Nodes.py', line 342, in analyse_expressions: >> StatListNode(tseries.pyx:106:8) >> File 'Nodes.py', line 5903, in analyse_expressions: >> ForInStatNode(tseries.pyx:106:8) >> File 'Nodes.py', line 342, in analyse_expressions: >> StatListNode(tseries.pyx:107:21) >> File 'Nodes.py', line 4767, in analyse_expressions: >> SingleAssignmentNode(tseries.pyx:107:21) >> File 'Nodes.py', line 4872, in analyse_types: >> SingleAssignmentNode(tseries.pyx:107:21) >> File 'ExprNodes.py', line 7082, in analyse_types: >> TypecastNode(tseries.pyx:107:21, >> result_is_used = True, >> use_managed_ref = True) >> File 'ExprNodes.py', line 4274, in analyse_types: >> AttributeNode(tseries.pyx:107:59, >> attribute = u'data', >> initialized_check = True, >> is_attribute = 1, >> member = u'data', >> needs_none_check = True, >> op = '->', >> result_is_used = True, >> use_managed_ref = True) >> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >> AttributeNode(tseries.pyx:107:59, >> attribute = u'data', >> initialized_check = True, >> is_attribute = 1, >> member = u'data', >> needs_none_check = True, >> op = '->', >> result_is_used = True, >> use_managed_ref = True) >> File 'ExprNodes.py', line 4436, in analyse_attribute: >> AttributeNode(tseries.pyx:107:59, >> attribute = u'data', >> initialized_check = True, >> is_attribute = 1, >> member = u'data', >> needs_none_check = True, >> op = '->', >> result_is_used = True, >> use_managed_ref = True) >> >> Compiler crash traceback from this point on: >> File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >> line 4436, in analyse_attribute >> replacement_node = numpy_transform_attribute_node(self) >> File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >> line
Re: [Cython] CEP1000: Native dispatch through callables
Stefan Behnel wrote: >mark florisson, 14.04.2012 23:00: >> On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: >>> * TBD: Information about GIL requirements (nogil, with gil?), how >>> exceptions are reported >> >> Maybe that could be a separate list, to be consulted mostly for >> explicit casts (I think PyErr_Occurred() would be the default for >> non-object return types). > >Good idea. We could have an additional "flags" field for each signature >(or >maybe just each callable?) that would contain orthogonal information >about >exception handling and GIL requirements. I don't think gil/nogil is orthogonal at all; I think you could export both versions as two different overloads (so that one can jump past gil-acquisition in with-gil-functions, etc) Dag > >Stefan >___ >cython-devel mailing list >cython-devel@python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
Robert Bradshaw wrote: Has anyone done any experiments/timings to see if having constants vs. globals even matters? My gut feeling is that one extra memory read is going to be insignificant compared to the time taken by the call itself and whatever it does. But of course gut feelings are always better when backed up (or refuted!) by measurements. -- Greg ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
Dag Sverre Seljebotn wrote: if (obj has signature "id)i") { This is an aside, but is it really necessary to define the signature syntax in a way that involves unmatched parens? Some editors (such as the one I like to use) get confused by this, even when they're inside quotes. The answer "get a better editor" would be entirely appropriate if there were some advantage to this syntax, over a non-unbalanced one, but I can't see any. -- Greg ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: > On 14 April 2012 20:08, Dag Sverre Seljebotn > wrote: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> >>> Travis Oliphant recently raised the issue on the NumPy list of what >>> mechanisms to use to box native functions produced by his Numba so that >>> SciPy functions can call it, e.g. (I'm making the numba part up): >> >> >> >> This thread is turning into one of those big ones... >> >> But I think it is really worth it in the end; I'm getting excited about the >> possibility down the road of importing functions using normal Python >> mechanisms and still have fast calls. >> >> Anyway, to organize discussion I've tried to revamp the CEP and describe >> both the intern-way and the strcmp-way. >> >> The wiki appears to be down, so I'll post it below... >> >> Dag >> >> = CEP 1000: Convention for native dispatches through Python callables = >> >> Many callable objects are simply wrappers around native code. This >> holds for any Cython function, f2py functions, manually >> written CPython extensions, Numba, etc. >> >> Obviously, when native code calls other native code, it would be >> nice to skip the significant cost of boxing and unboxing all the arguments. >> Early binding at compile-time is only possible >> between different Cython modules, not between all the tools >> listed above. >> >> [[enhancements/nativecall|CEP 523]] deals with Cython-specific aspects >> (and is out-of-date w.r.t. this CEP); this CEP is intended to be about >> a cross-project convention only. If a success, this CEP may be >> proposesd as a PEP in a modified form. >> >> Motivating example (looking a year or two into the future): >> >> {{{ >> @numba >> def f(x): return 2 * x >> >> @cython.inline >> def g(x : cython.double): return 3 * x >> >> from fortranmod import h >> >> print f(3) >> print g(3) >> print h(3) >> print scipy.integrate.quad(f, 0.2, 3) # fast callback! >> print scipy.integrate.quad(g, 0.2, 3) # fast callback! >> print scipy.integrate.quad(h, 0.2, 3) # fast callback! >> >> }}} >> >> == The native-call slot == >> >> We need ''fast'' access to probing whether a callable object supports >> this CEP. Other mechanisms, such as an attribute in a dict, is too >> slow for many purposes (quoting robertwb: "We're trying to get a 300ns >> dispatch down to 10ns; you do not want a 50ns dict lookup"). (Obviously, >> if you call a callable in a loop you can fetch the pointer outside >> of the loop. But in particular if this becomes a language feature >> in Cython it will be used in all sorts of places.) >> >> So we hack another type slot into existing and future CPython >> implementations in the following way: This CEP provides a C header >> that for all Python versions define a macro >> {{{Py_TPFLAGS_UNOFFICIAL_EXTRAS}}} for a free bit in >> {{{tp_flags}}} in the {{{PyTypeObject}}}. >> >> If present, then we extend {{{PyTypeObject}}} >> as follows: >> {{{ >> typedef struct { >> PyTypeObject tp_main; >> size_t tp_unofficial_flags; >> size_t tp_nativecall_offset; >> } PyUnofficialTypeObject; >> }}} >> >> {{{tp_unofficial_flags}}} is unused and should be all 0 for the time >> being, but can be used later to indicate features beyond this CEP. >> >> If {{{tp_nativecall_offset != 0}}}, this CEP is supported, and >> the information for doing a native dispatch on a callable {{{obj}}} >> is located at >> {{{ >> (char*)obj + ((PyUnofficialTypeObject*)obj->ob_type)->tp_nativecall_offset; >> }}} >> >> === GIL-less accesss === >> >> It is OK to access the native-call table without holding the GIL. This >> should of course only be used to call functions that state in their >> signature that they don't need the GIL. >> >> This is important for JITted callables who would like to rewrite their >> table as more specializations gets added; if one needs to reallocate >> the table, the old table must linger along long enough that all >> threads that are currently accessing it are done with it. >> >> == Native dispatch descriptor == >> >> The final format for the descriptor is not agreed upon yet; this sums >> up the major alternatives. >> >> The descriptor should be a list of specializations/overload, each >> described by a function pointer and a signature specification >> string, such as "id)i" for {{{int f(int, double)}}}. >> >> The way it is stored must cater for two cases; first, when the caller >> expects one or more hard-coded signatures: >> {{{ >> if (obj has signature "id)i") { >> call; >> } else if (obj has signature "if)i") { >> call with promoted second argument; >> } else { >> box all arguments; >> PyObject_Call; >> } >> }}} > > There may be a lot of promotion/demotion (you likely only want the > former) combinations, especially for multiple arguments, so perhaps it > makes sense to limit ourselves a bit. For instance for numeric scalar > argument types we could limit to long (and the unsigned counterparts), > double and double complex. > > So char
Re: [Cython] CEP1000: Native dispatch through callables
On Sat, Apr 14, 2012 at 5:28 PM, Greg Ewing wrote: > Robert Bradshaw wrote: > >> Has anyone done any experiments/timings to see if having constants vs. >> globals even matters? > > > My gut feeling is that one extra memory read is going to be > insignificant compared to the time taken by the call itself > and whatever it does. This is most valuable for really fast calls (e.g. a user-defined double -> double), and compilers (and processors) have evolved to a point that they're often surprising and difficult to reason about. > But of course gut feelings are always > better when backed up (or refuted!) by measurements. I agree with your gut feeling (where insignificant to me is <3%) but can't rule it out, and data trumps consensus :). > This is an aside, but is it really necessary to define the > signature syntax in a way that involves unmatched parens? > Some editors (such as the one I like to use) get confused > by this, even when they're inside quotes. > > The answer "get a better editor" would be entirely > appropriate if there were some advantage to this syntax, > over a non-unbalanced one, but I can't see any. Brevity, especially if the signature is inlined. (Encoding could take care of this by, e.g. ignoring the redundant opening, or we could just write di=d.) - Robert ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
Robert Bradshaw, 15.04.2012 07:59: > On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: >> There may be a lot of promotion/demotion (you likely only want the >> former) combinations, especially for multiple arguments, so perhaps it >> makes sense to limit ourselves a bit. For instance for numeric scalar >> argument types we could limit to long (and the unsigned counterparts), >> double and double complex. >> >> So char, short and int scalars will be >> promoted to long, float to double and float complex to double complex. >> Anything bigger, like long long etc will be matched specifically. >> Promotions and associated demotions if necessary in the callee should >> be fairly cheap compared to checking all combinations or going through >> the python layer. > > True, though this could be a convention rather than a requirement of > the spec. Long vs. < long seems natural, but are there any systems > where (scalar) float still has an advantage over double? > > Of course pointers like float* vs double* can't be promoted, so we > would still need this kind of type declaration. Yes, passing data sets as C arrays requires proper knowledge about their memory layout on both sides. OTOH, we are talking about functions that would otherwise be called through Python, so this could only apply for buffers anyway. So why not require a Py_buffer* as argument for them? Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
mark florisson, 14.04.2012 23:15: > On 14 April 2012 22:02, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 14.04.2012 21:08: >>> * TBD: Support for Cython-specific constructs like memoryview slices >>>(so that arrays with strides and shape can be passed faster than >>>passing an {{{"O"}}}). >> >> Is this really Cython specific or would a generic Py_buffer struct work? > > That could work through simple unboxing wrapper functions, but it > would add some overhead, specifically because it would have to check > the buffer's object, and if it didn't exist or was not a memoryview > object, it would have to create one (checking whether something is a > memoryview object would also be a pain, as each module has a different > memoryview type). That could still be feasible for interaction with > Cython functions from non-Cython code. Hmm, I don't get it. Isn't the overhead always there when a memory view is requested in the signature? You'd have to create one for each call and that seriously hurts the efficiency. Is that a common use case? Why would you want to do more than passing unboxed buffers? Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
On Sat, Apr 14, 2012 at 2:02 PM, Stefan Behnel wrote: > Hi, > > thanks for writing this up. Comments inline as I read through it. > > Dag Sverre Seljebotn, 14.04.2012 21:08: >> === GIL-less accesss === >> >> It is OK to access the native-call table without holding the GIL. This >> should of course only be used to call functions that state in their >> signature that they don't need the GIL. >> >> This is important for JITted callables who would like to rewrite their >> table as more specializations gets added; if one needs to reallocate >> the table, the old table must linger along long enough that all >> threads that are currently accessing it are done with it. > > The problem here is that changing the table in the face of threaded access > is very likely to introduce race conditions, and the average library out > there won't know when all threads are done with it. I don't think later > modification is a good idea. I agree; a JIT that wants to do this can over-allocate. >> == Native dispatch descriptor == >> >> The final format for the descriptor is not agreed upon yet; this sums >> up the major alternatives. >> >> The descriptor should be a list of specializations/overload > > While overloaded signatures are great for the callee, they make things much > more complicated for the caller. It's no longer just one signature that > either matches or not. Especially when we allow more than one expected > signature, then each of them has to be compared against all exported > signatures. > > We'll have to see what the runtime impact and the impact on the code > complexity is, I guess. The caller could choose to only check the first signature to avoid complexity. I think, however, that overloaded signatures are important, and even checking a dozen is cheaper than going through the Python call. Fused types naturally lead to overloads as well. >> each described by a function pointer and a signature specification >> string, such as "id)i" for {{{int f(int, double)}}}. > > How do we deal with object argument types? Do we care on the caller side? > Functions might have alternative signatures that differ in the type of > their object parameters. Or should we handle this inside of the caller and > expect that it's something like a fused function with internal dispatch in > that case? > > Personally, I think there is not enough to gain from object parameters that > we should handle it on the caller side. The callee can dispatch those if > necessary. I don't think we should prohibit the signature from being able to declare arbitrary Cython types. Whether it proves useful is dependent on the library, and is the library writer's choice. > What about signatures that require an object when we have a C typed value? > > What about signatures that require a C typed argument when we have an > arbitrary object value in our call parameters? When considering conversion, one gets into the sticky question of finding the "best" overload. I'd be inclined to not do any conversion in the caller. One can (should) export the object version of the signature as well to avoid the slow Python call. > We should also strip the "self" argument from the parameter list of > methods. That's handled by the attribute lookup before even getting at the > callable. > > >> === Approach 1: Interning/run-time allocated IDs === >> >> >> 1A: Let each overload have a struct >> {{{ >> struct { >> size_t signature_id; >> char *signature; >> void *func_ptr; >> }; >> }}} >> Within each process run, there is a 1:1 > > mapping/relation > >> between {{{signature}}} and >> {{{signature_id}}}. {{{signature_id}}} is allocated by some central >> registry. >> >> 1B: Intern the string instead: >> {{{ >> struct { >> char *signature; /* pointer must come from the central registry */ >> void *func_ptr; >> }; >> }}} >> However this is '''not'' trivial, since signature strings can >> be allocated on the heap (e.g., a JIT would do this), so interned strings >> must be memory managed and reference counted. > > Not necessarily, they are really short strings that could just live > forever, stored efficiently by the registry in a series of larger memory > blocks. It would take a while to fill up enough memory with those to become > problematic. Finding an efficiently lookup scheme for them might become > interesting at some point, but that would also take a while. > > I don't expect real-world systems to have to deal with thousands of > different runtime(!) discovered signatures during one interpreter lifetime. > > >> Discussion >> >> '''The cost of comparing a signature''': Comparing a global variable (needle) >> to a value that is guaranteed to already be in cache (candidate match) >> >> '''Pros:''' >> >> * Conceptually simple struct format. >> >> '''Cons:''' >> >> * Requires a registry for interning strings. This must be >> "handshaked" between the implementors of this CEP (probably by >> "first to get at {{{sys.modules["_nativecall"}}
Re: [Cython] CEP1000: Native dispatch through callables
Greg Ewing, 15.04.2012 03:07: > Dag Sverre Seljebotn wrote: > >> if (obj has signature "id)i") { > > This is an aside, but is it really necessary to define the > signature syntax in a way that involves unmatched parens? > Some editors (such as the one I like to use) get confused > by this, even when they're inside quotes. > > The answer "get a better editor" would be entirely > appropriate if there were some advantage to this syntax, > over a non-unbalanced one, but I can't see any. It wasn't really a proposed syntax, I guess, more of a way to write down an example. It should be easy to do without any special separator by moving the return type first, for example. Also, it's not clear yet if we will actually use such a character syntax at all. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
On Sat, Apr 14, 2012 at 11:16 PM, Stefan Behnel wrote: > Robert Bradshaw, 15.04.2012 07:59: >> On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: >>> There may be a lot of promotion/demotion (you likely only want the >>> former) combinations, especially for multiple arguments, so perhaps it >>> makes sense to limit ourselves a bit. For instance for numeric scalar >>> argument types we could limit to long (and the unsigned counterparts), >>> double and double complex. >>> >>> So char, short and int scalars will be >>> promoted to long, float to double and float complex to double complex. >>> Anything bigger, like long long etc will be matched specifically. >>> Promotions and associated demotions if necessary in the callee should >>> be fairly cheap compared to checking all combinations or going through >>> the python layer. >> >> True, though this could be a convention rather than a requirement of >> the spec. Long vs. < long seems natural, but are there any systems >> where (scalar) float still has an advantage over double? >> >> Of course pointers like float* vs double* can't be promoted, so we >> would still need this kind of type declaration. > > Yes, passing data sets as C arrays requires proper knowledge about their > memory layout on both sides. > > OTOH, we are talking about functions that would otherwise be called through > Python, so this could only apply for buffers anyway. So why not require a > Py_buffer* as argument for them? That's certainly our (initial?) usecase, but there's no need to limit the protocol to this. - Robert ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
On 04/15/2012 08:16 AM, Stefan Behnel wrote: Robert Bradshaw, 15.04.2012 07:59: On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: There may be a lot of promotion/demotion (you likely only want the former) combinations, especially for multiple arguments, so perhaps it makes sense to limit ourselves a bit. For instance for numeric scalar argument types we could limit to long (and the unsigned counterparts), double and double complex. So char, short and int scalars will be promoted to long, float to double and float complex to double complex. Anything bigger, like long long etc will be matched specifically. Promotions and associated demotions if necessary in the callee should be fairly cheap compared to checking all combinations or going through the python layer. True, though this could be a convention rather than a requirement of the spec. Long vs.< long seems natural, but are there any systems where (scalar) float still has an advantage over double? Of course pointers like float* vs double* can't be promoted, so we would still need this kind of type declaration. Yes, passing data sets as C arrays requires proper knowledge about their memory layout on both sides. OTOH, we are talking about functions that would otherwise be called through Python, so this could only apply for buffers anyway. So why not require a Py_buffer* as argument for them? Is the proposal to limit the range of types valid for arguments? I'm a bit wary of throwing this into the mix. We know very little about the callee, they could decide: a) To only export a C function and have an exeption-raising __call__ b) To accept ctypes pointers in their __call__, and C pointers in their native-call c) They can invent their own use for this! I think agreeing on a CEP gets a lot simpler, and the result cleaner, if we focus on "how to describe C functions for the purposes of calling them" (for various usecases), and leave "conventions for recommended signatures" for CEP 1001. In Cython, we could always export a fully-promoted-scalar function first in the list, and always try to call this first, which would work well with Cython<->Cython. BTW, when Travis originally wanted a proposal on the NumPy list he just wanted it for "a C function"; his idea was something like mycapsule = numbaize(f) scipy.integrate(mycapsule) just saying that the fast-callable aspect isn't everything, passing the function pointer around was how this started. Dag ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
Robert Bradshaw, 15.04.2012 08:27: > On Sat, Apr 14, 2012 at 2:02 PM, Stefan Behnel wrote: >> While overloaded signatures are great for the callee, they make things much >> more complicated for the caller. It's no longer just one signature that >> either matches or not. Especially when we allow more than one expected >> signature, then each of them has to be compared against all exported >> signatures. >> >> We'll have to see what the runtime impact and the impact on the code >> complexity is, I guess. > > The caller could choose to only check the first signature to avoid > complexity. I think, however, that overloaded signatures are > important, and even checking a dozen is cheaper than going through the > Python call. Fused types naturally lead to overloads as well. Hmm, maybe it wouldn't even be all that inefficient. If both sides sorted their signatures by highest efficiency at compile time, the first hit would cut down the number of signatures that need further comparison to the ones before the match. >>> each described by a function pointer and a signature specification >>> string, such as "id)i" for {{{int f(int, double)}}}. >> >> How do we deal with object argument types? Do we care on the caller side? >> Functions might have alternative signatures that differ in the type of >> their object parameters. Or should we handle this inside of the caller and >> expect that it's something like a fused function with internal dispatch in >> that case? >> >> Personally, I think there is not enough to gain from object parameters that >> we should handle it on the caller side. The callee can dispatch those if >> necessary. > > I don't think we should prohibit the signature from being able to > declare arbitrary Cython types. Whether it proves useful is dependent > on the library, and is the library writer's choice. It leads to very different requirements for the signature encoding/syntax, though. I think we should only go that route when we think we have to. >>> * Requires a registry for interning strings. This must be >>>"handshaked" between the implementors of this CEP (probably by >>>"first to get at {{{sys.modules["_nativecall"}}} sticks it there), >>>as we can't ship a common dependency library for this CEP. >> >> ... which would eventually end up in the stdlib, but could equally well >> come from PyPI for now. I don't see a problem with that. >> >> Using sys.modules (or another global store) instead of an explicit import >> allows for dependency injection, that's good. > > It excludes (or makes it difficult) for non-Python libraries to participate. True, but that can be helped by providing a library (or header file) that provides simple C calls for the required setup. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
Robert Bradshaw, 15.04.2012 08:32: > On Sat, Apr 14, 2012 at 11:16 PM, Stefan Behnel wrote: >> Robert Bradshaw, 15.04.2012 07:59: >>> On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: There may be a lot of promotion/demotion (you likely only want the former) combinations, especially for multiple arguments, so perhaps it makes sense to limit ourselves a bit. For instance for numeric scalar argument types we could limit to long (and the unsigned counterparts), double and double complex. So char, short and int scalars will be promoted to long, float to double and float complex to double complex. Anything bigger, like long long etc will be matched specifically. Promotions and associated demotions if necessary in the callee should be fairly cheap compared to checking all combinations or going through the python layer. >>> >>> True, though this could be a convention rather than a requirement of >>> the spec. Long vs. < long seems natural, but are there any systems >>> where (scalar) float still has an advantage over double? >>> >>> Of course pointers like float* vs double* can't be promoted, so we >>> would still need this kind of type declaration. >> >> Yes, passing data sets as C arrays requires proper knowledge about their >> memory layout on both sides. >> >> OTOH, we are talking about functions that would otherwise be called through >> Python, so this could only apply for buffers anyway. So why not require a >> Py_buffer* as argument for them? > > That's certainly our (initial?) usecase, but there's no need to limit > the protocol to this. I think the question here is: is this supposed to be a best effort protocol for bypassing Python calls, or would it be an error in some situations if no matching signature can be found? Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] CEP1000: Native dispatch through callables
Ah, Cython objects. Didn't think of that. More below. On 04/14/2012 11:02 PM, Stefan Behnel wrote: Hi, thanks for writing this up. Comments inline as I read through it. Dag Sverre Seljebotn, 14.04.2012 21:08: each described by a function pointer and a signature specification string, such as "id)i" for {{{int f(int, double)}}}. How do we deal with object argument types? Do we care on the caller side? Functions might have alternative signatures that differ in the type of their object parameters. Or should we handle this inside of the caller and expect that it's something like a fused function with internal dispatch in that case? > > Personally, I think there is not enough to gain from object parameters that > we should handle it on the caller side. The callee can dispatch those if > necessary. > > What about signatures that require an object when we have a C typed value? > > What about signatures that require a C typed argument when we have an > arbitrary object value in our call parameters? > > We should also strip the "self" argument from the parameter list of > methods. That's handled by the attribute lookup before even getting at the > callable. On 04/15/2012 07:59 AM, Robert Bradshaw wrote: > It would certainly be useful to have special syntax for memory views > (after nailing down a well-defined ABI for them) and builtin types. > Being able to declare something as taking a > "sage.rings.integer.Integer" could also prove useful, but could result > in long (and prefix-sharing) signatures, favoring the > runtime-allocated ids. I do think describing Cython objects in this cross-tool CEP would work nicely, this is for standardized ABIs only (we can't do memoryviews either until their ABI is standard). I think I prefer to a) exclude it now, and b) down the line we need another cross-tool ABI to communicate vtables, and then we could put that into this CEP now. I strongly believe we should go with the Go "duck-typing" approach for interfaces, i.e. it is not the declared name that should be compared but the method names and signatures. The only question that needs answering for CEP1000 is: Would this blow up the signature string enough that interning is the only viable option? Some strcmp solutions: a) Hash each vtable descriptor to 160-bits, and assume the hash is unique. Still, a couple of interfaces would blow up the signature string a lot. b) Modify approach B in CEP 1000 to this: If it is longer than 160 bits, take a full cryptographic hash, and just assume there won't be hash collisions (like git does). This still saves for short signature strings, and avoids interning at the cost of doing 160-bit comparisons. Both of these require other ways at getting at the actual string data. But I still like b) above better than interning. Dag ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel