[Python-Dev] Enhancement request for PyUnicode proxies

2020-12-25 Thread Nelson, Karl E. via Python-Dev
I was directed to post this request to the general Python development community 
so hopefully this is on topic.

One of the weaknesses of the PyUnicode implementation is that the type is 
concrete and there is no option for an abstract proxy string to a foreign 
source.  This is an issue for an API like JPype in which java.lang.Strings are 
passed back from Java.   Ideally these would be a type derived from the Unicode 
type str, but that requires transferring the memory immediately from Java to 
Python even when that handle is large and will never be accessed from within 
Python.  For certain operations like XML parsing this can be prohibitable, so 
instead of returning a str we return a JString.   (There is a separate issue 
that Java method names and Python method names conflict so direct inheritance 
creates some problems.)

The JString type can of course be transferred to Python space at any time as 
both Python Unicode and Java string objects are immutable.  However the CPython 
API which takes strings only accepts the Unicode type objects which have a 
concrete implementation.  It is possible to extend strings, but those 
extensions do not allow for proxing as far as I can tell.  Thus there is no 
option currently to proxy to a string representation in another language.  The 
concept of the using the duck type ``__str__`` method is insufficient as this 
indices that an object can become a string, rather than "this object is 
effectively a string" for the purposes of the CPython API.

One way to address this is to use currently outdated copy of READY to extend 
Unicode objects to other languages.  A class like JString would be an unready 
Unicode object which when READY is called transfers the memory from Java, sets 
up the flags and sets up a pointer to the code point representation.  
Unfortunately the READY concept is scheduled for removal and thus the chance to 
address the needs for proxying a Unicode to another languages representation 
may be limited. There may be other methods to accomplish this without using the 
concept of READY.  So long as access to the code points go through the Unicode 
API and the Unicode object can be extended such that the actual code points may 
be located outside of the Unicode object then a proxy can still be achieved if 
there are hooks in it to decided when a transfer should be performed.   
Generally the transfer request only needs to happen once  but the key issue 
being that the number of code points (nor the kind of points) will not be known 
until the memory is transferred.

Java has much the same problem.   Although they defined an interface class 
"java.lang.CharacterArray" the actually "java.lang.String" class is concrete 
and almost all API methods take a String rather than the base interface even 
when the base interface would have been adequate.  Thus just like Python has 
difficulty treating a foreign string class as it would a native one, Java 
cannot treat a Python string as native one as well.  So Python strings get 
represented as CharacterArray type which effectively limits it use greatly.

Summary:


  *   A String proxy would need the address of the memory in the "wstr" slot 
though the code points may be char[], wchar[] or int[] depending the 
representation in the proxy.
  *   API calls to interpret the data would need to check to see if the data is 
transferred first, if not it would call the proxy dependent transfer method 
which is responsible for creating a block of code points and set up flags 
(kind, ascii, ready, and compact).
  *   The memory block allocated would need to call the proxy dependent 
destructor to clean up with the string is done.
  *   It is not clear if this would have impact on performance.   Python 
already has the concept of a string which needs actions before it can be 
accessed, but this is scheduled for removal.

Are there any plans currently to address the concept of a proxy string in 
PyUnicode API?


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BDJAQDPQMVCLCSB3CEM34VPAY666D3M3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Enhancement request for PyUnicode proxies

2021-01-04 Thread Nelson, Karl E. via Python-Dev
I would like to second Steve's suggestion.

The requirements for JPype for this to work are pretty minimal.   If there were 
a bit flag for string like that was checked by PyString_Check and then a call 
to the PyObject_Str() which would be guaranteed to return a concrete Unicode 
which is then used  throughout the function call.  This would not require any 
additional slots.   Unfortunately, this doesn't match the other patterns in 
Python as if it passes PyString_Check then why would one need to call a casting 
object to get the actual string.  It could be as simple as making a macro 
PyString_ToUnicode() that calls PyUnicode_Check and if it passes creates a new 
reference else returns PyObject_Str().  It is then just a small matter for 
JString, ObjCStr, WinHTString, etc to set this bit flag when the type is 
created.  The downside of this is that we end up with an extra 
reference/dereference in string using functions, but given ownership concerns 
of a Buffer like protocol this is really the minimum required.

This does not deal with ObjC requirement through as unlike Java, ObjC has 
mutable strings.   There are number of parts of the Python API where the string 
is consumed immediately were immutable and mutable strings do not matter.  But 
others like the hash or dictionary keys require immutable.  So perhaps there 
needs to also be PyString_IsImmutable() so that we can prevent accidentally 
usage of a mutable string.

I would be happy to help with this effort, but I am in the unfortunate position 
that the legal department at my employer (DOE/LLNL) has objected to some clause 
in the PSF Contributor Agreement thus prohibiting me from signing it.   We also 
have a policy that prohibits open source contributions to projects that require 
signing agreements without laboratory legal sign off so I am in a bind until 
such time as I deal with their concerns.

--Karl

-Original Message-
From: Steve Dower  
Sent: Monday, January 4, 2021 8:54 AM
To: python-dev@python.org
Subject: [Python-Dev] Re: Enhancement request for PyUnicode proxies

On 12/29/2020 5:23 PM, Antoine Pitrou wrote:
> The third option is to add a distinct "string view" protocol.  There 
> are peculiarities (such as the fact that different objects may have 
> different internal representations - some utf8, some utf16...) that 
> make the buffer protocol suboptimal for this.
> 
> Also, we probably don't want unicode-like objects to start being 
> usable in contexts where a buffer-like object is required (such as 
> writing to a binary file, or zlib-compressing a bunch of bytes).

I've had to deal with this problem in the past as well (WinRT HSTRINGs), and 
this is the approach that would seem to make the most sense to me.

Basically, reintroduce PyString_* APIs as an _abstract_ interface to str-like 
objects.

So the first line of every single one can be PyUnicode_Check() followed by 
calling the _concrete_ PyUnicode_* implementation. And then we develop 
additional type slots or whatever is necessary for someone to build an 
equivalent native object.

Most "is this a str" checks can become PyString_Check, provided all the APIs 
used against the object are abstract (PyObject_* or PyString_*). 
Those that are going to mess with internals will have to get special treatment.

I don't want to make it all sound too easy, because it probably won't be. But 
it should be possible to add a viable proxy layer as a set of abstract C APIs 
to use instead of the concrete ones.

Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email 
to python-dev-le...@python.org 
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TC3BZJX4DGC2WV32AHIX7A57HQNJ2EMO/
Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NQSXIJB4GGDYZ6BVEA2IZ4MCAQGCMRFW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Heap types (PyType_FromSpec) must fully implement the GC protocol

2021-01-12 Thread Nelson, Karl E. via Python-Dev
Having used the heap types extensively for JPype, I believe that converting all 
types too heap types would be a great benefit.  There are still minor rough 
spots in which a static type can do things that heap types cannot (such as you 
can derive a type which is marked final when it is static but not heap such as 
function).  But generally I found heap types to be much more flexible.   I 
found that heap types were better in concept than static but because the 
majority of the API (and the examples on using CAPI) were static the heap types 
paths were less exercised.   I eventually puzzled out most of the mysteries, 
but having the everything be the same (except for old static types that should 
be marked as immortal) likely has a lot of side benefits.   

Of course the other issue that I have with heap types is that they currently 
lack the concept of meta classes.   Thus there are things that you can do from 
the Python language that you can't do from the C API.  See...

https://bugs.python.org/issue42617

The downside of course is there are a lot of calls in the C API that infer that 
static type is fixed address.   Perhaps those call all be macros to the which 
equate to evaluating the address of the heap type.   

But that is just my 2 cents.

--Karl

-Original Message-
From: Neil Schemenauer  
Sent: Tuesday, January 12, 2021 10:17 AM
To: Victor Stinner 
Cc: Python Dev 
Subject: [Python-Dev] Re: Heap types (PyType_FromSpec) must fully implement the 
GC protocol

On 2021-01-12, Victor Stinner wrote:
> It seems like a safer approach is to continue the work on
> bpo-40077: "Convert static types to PyType_FromSpec()".

I agree that trying to convert static types is a good idea.  Another possible 
bonus might be that we can gain some performance by integrating garbage 
collection with the Python object memory allocator.  Static types frustrate 
that effort.

Could we have something easier to use than PyType_FromSpec(), for the purposes 
of coverting existing code?  I was thinking of something like:

static PyTypeObject Foo_TypeStatic = {
}
static PyTypeObject *Foo_Type;

PyInit_foo(void)
{
Foo_Type = PyType_FromStatic(&Foo_TypeStatic);
}


The PyType_FromStatic() would return a new heap type, created by copying the 
static type.  The static type could be marked as being unusable (e.g. with a 
type flag).
___
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email 
to python-dev-le...@python.org 
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQLONM2OCXKPVCIDKVLQOJR7EUU/
Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DV4SPP2TTXGYMTMRMEO6TG5W7XPZKPXX/
Code of Conduct: http://python.org/psf/codeofconduct/