[Python-Dev] Enhancement request for PyUnicode proxies
I was directed to post this request to the general Python development community so hopefully this is on topic. One of the weaknesses of the PyUnicode implementation is that the type is concrete and there is no option for an abstract proxy string to a foreign source. This is an issue for an API like JPype in which java.lang.Strings are passed back from Java. Ideally these would be a type derived from the Unicode type str, but that requires transferring the memory immediately from Java to Python even when that handle is large and will never be accessed from within Python. For certain operations like XML parsing this can be prohibitable, so instead of returning a str we return a JString. (There is a separate issue that Java method names and Python method names conflict so direct inheritance creates some problems.) The JString type can of course be transferred to Python space at any time as both Python Unicode and Java string objects are immutable. However the CPython API which takes strings only accepts the Unicode type objects which have a concrete implementation. It is possible to extend strings, but those extensions do not allow for proxing as far as I can tell. Thus there is no option currently to proxy to a string representation in another language. The concept of the using the duck type ``__str__`` method is insufficient as this indices that an object can become a string, rather than "this object is effectively a string" for the purposes of the CPython API. One way to address this is to use currently outdated copy of READY to extend Unicode objects to other languages. A class like JString would be an unready Unicode object which when READY is called transfers the memory from Java, sets up the flags and sets up a pointer to the code point representation. Unfortunately the READY concept is scheduled for removal and thus the chance to address the needs for proxying a Unicode to another languages representation may be limited. There may be other methods to accomplish this without using the concept of READY. So long as access to the code points go through the Unicode API and the Unicode object can be extended such that the actual code points may be located outside of the Unicode object then a proxy can still be achieved if there are hooks in it to decided when a transfer should be performed. Generally the transfer request only needs to happen once but the key issue being that the number of code points (nor the kind of points) will not be known until the memory is transferred. Java has much the same problem. Although they defined an interface class "java.lang.CharacterArray" the actually "java.lang.String" class is concrete and almost all API methods take a String rather than the base interface even when the base interface would have been adequate. Thus just like Python has difficulty treating a foreign string class as it would a native one, Java cannot treat a Python string as native one as well. So Python strings get represented as CharacterArray type which effectively limits it use greatly. Summary: * A String proxy would need the address of the memory in the "wstr" slot though the code points may be char[], wchar[] or int[] depending the representation in the proxy. * API calls to interpret the data would need to check to see if the data is transferred first, if not it would call the proxy dependent transfer method which is responsible for creating a block of code points and set up flags (kind, ascii, ready, and compact). * The memory block allocated would need to call the proxy dependent destructor to clean up with the string is done. * It is not clear if this would have impact on performance. Python already has the concept of a string which needs actions before it can be accessed, but this is scheduled for removal. Are there any plans currently to address the concept of a proxy string in PyUnicode API? ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BDJAQDPQMVCLCSB3CEM34VPAY666D3M3/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Enhancement request for PyUnicode proxies
I would like to second Steve's suggestion. The requirements for JPype for this to work are pretty minimal. If there were a bit flag for string like that was checked by PyString_Check and then a call to the PyObject_Str() which would be guaranteed to return a concrete Unicode which is then used throughout the function call. This would not require any additional slots. Unfortunately, this doesn't match the other patterns in Python as if it passes PyString_Check then why would one need to call a casting object to get the actual string. It could be as simple as making a macro PyString_ToUnicode() that calls PyUnicode_Check and if it passes creates a new reference else returns PyObject_Str(). It is then just a small matter for JString, ObjCStr, WinHTString, etc to set this bit flag when the type is created. The downside of this is that we end up with an extra reference/dereference in string using functions, but given ownership concerns of a Buffer like protocol this is really the minimum required. This does not deal with ObjC requirement through as unlike Java, ObjC has mutable strings. There are number of parts of the Python API where the string is consumed immediately were immutable and mutable strings do not matter. But others like the hash or dictionary keys require immutable. So perhaps there needs to also be PyString_IsImmutable() so that we can prevent accidentally usage of a mutable string. I would be happy to help with this effort, but I am in the unfortunate position that the legal department at my employer (DOE/LLNL) has objected to some clause in the PSF Contributor Agreement thus prohibiting me from signing it. We also have a policy that prohibits open source contributions to projects that require signing agreements without laboratory legal sign off so I am in a bind until such time as I deal with their concerns. --Karl -Original Message- From: Steve Dower Sent: Monday, January 4, 2021 8:54 AM To: python-dev@python.org Subject: [Python-Dev] Re: Enhancement request for PyUnicode proxies On 12/29/2020 5:23 PM, Antoine Pitrou wrote: > The third option is to add a distinct "string view" protocol. There > are peculiarities (such as the fact that different objects may have > different internal representations - some utf8, some utf16...) that > make the buffer protocol suboptimal for this. > > Also, we probably don't want unicode-like objects to start being > usable in contexts where a buffer-like object is required (such as > writing to a binary file, or zlib-compressing a bunch of bytes). I've had to deal with this problem in the past as well (WinRT HSTRINGs), and this is the approach that would seem to make the most sense to me. Basically, reintroduce PyString_* APIs as an _abstract_ interface to str-like objects. So the first line of every single one can be PyUnicode_Check() followed by calling the _concrete_ PyUnicode_* implementation. And then we develop additional type slots or whatever is necessary for someone to build an equivalent native object. Most "is this a str" checks can become PyString_Check, provided all the APIs used against the object are abstract (PyObject_* or PyString_*). Those that are going to mess with internals will have to get special treatment. I don't want to make it all sound too easy, because it probably won't be. But it should be possible to add a viable proxy layer as a set of abstract C APIs to use instead of the concrete ones. Cheers, Steve ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TC3BZJX4DGC2WV32AHIX7A57HQNJ2EMO/ Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NQSXIJB4GGDYZ6BVEA2IZ4MCAQGCMRFW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Heap types (PyType_FromSpec) must fully implement the GC protocol
Having used the heap types extensively for JPype, I believe that converting all types too heap types would be a great benefit. There are still minor rough spots in which a static type can do things that heap types cannot (such as you can derive a type which is marked final when it is static but not heap such as function). But generally I found heap types to be much more flexible. I found that heap types were better in concept than static but because the majority of the API (and the examples on using CAPI) were static the heap types paths were less exercised. I eventually puzzled out most of the mysteries, but having the everything be the same (except for old static types that should be marked as immortal) likely has a lot of side benefits. Of course the other issue that I have with heap types is that they currently lack the concept of meta classes. Thus there are things that you can do from the Python language that you can't do from the C API. See... https://bugs.python.org/issue42617 The downside of course is there are a lot of calls in the C API that infer that static type is fixed address. Perhaps those call all be macros to the which equate to evaluating the address of the heap type. But that is just my 2 cents. --Karl -Original Message- From: Neil Schemenauer Sent: Tuesday, January 12, 2021 10:17 AM To: Victor Stinner Cc: Python Dev Subject: [Python-Dev] Re: Heap types (PyType_FromSpec) must fully implement the GC protocol On 2021-01-12, Victor Stinner wrote: > It seems like a safer approach is to continue the work on > bpo-40077: "Convert static types to PyType_FromSpec()". I agree that trying to convert static types is a good idea. Another possible bonus might be that we can gain some performance by integrating garbage collection with the Python object memory allocator. Static types frustrate that effort. Could we have something easier to use than PyType_FromSpec(), for the purposes of coverting existing code? I was thinking of something like: static PyTypeObject Foo_TypeStatic = { } static PyTypeObject *Foo_Type; PyInit_foo(void) { Foo_Type = PyType_FromStatic(&Foo_TypeStatic); } The PyType_FromStatic() would return a new heap type, created by copying the static type. The static type could be marked as being unusable (e.g. with a type flag). ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQLONM2OCXKPVCIDKVLQOJR7EUU/ Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DV4SPP2TTXGYMTMRMEO6TG5W7XPZKPXX/ Code of Conduct: http://python.org/psf/codeofconduct/