[Python-Dev] Re: Proliferation of tstate arguments.

2020-03-21 Thread Nathaniel Smith
On Fri, Mar 20, 2020 at 11:27 AM Victor Stinner  wrote:
> I would prefer to continue to experiment passing tstate explicitly in
> internal C APIs until most blocker issues will be fixed. Once early
> work on running two subinterpreters in parallel will start working
> (one "GIL" per interpreter), I will be more open to reconsider using a
> TLS variable.

The PEP for parallel subinterpreters hasn't been accepted yet either, right?

> "Inefficient signal handling in multithreaded applications"
> https://bugs.python.org/issue40010

CPython's current signal handling architecture basically assumes that
signals are always delivered to the main thread. (Fortunately, on real
systems, this is almost always true.) In particular, it assumes that
if a syscall arrives while the main thread is blocked in a
long-running syscall, then the syscall will be interrupted, which is
only true when the signal is delivered to the main thread. AFAICT if
we really care about off-main-thread signals, then the only way to
handle them properly is for the signal handler to detect when they
happen, and redeliver the signal to the main thread using
pthread_kill, and then let the main thread set its own eval_breaker
etc.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FTVIAXHDHUNQWLBZQ4YIQXTFFDZ762GL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Rob Cliffe via Python-Dev



On 20/03/2020 22:21, Victor Stinner wrote:



Motivating examples from the Python standard library


The examples below demonstrate how the proposed methods can make code
one or more of the following: (...)

IMO there are too many examples. For example, refactor.py and
c_annotations.py are more or less the same. Just keep refactor.py.

Overall, 2 or 3 examples should be enough.


In which case adding something like

`There were many other such examples in the stdlib.`

would make the PEP more compelling.

Rob Cliffe
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4U555WKFSQBK7PV34KIA2W52P5ECF723/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Proliferation of tstate arguments.

2020-03-21 Thread Antoine Pitrou
On Fri, 20 Mar 2020 19:24:22 +0100
Victor Stinner  wrote:
> 
> One good example is Py_AddPendingCall(). The documentation says that
> it's safe to call it without holding the GIL. Except that right now,
> there is no reliable way to get the correct interpreter in this case
> (correct me if I'm wrong!).

Define what "the correct interpreter" is?

The only way to solve this conundrum IMHO is to add a
`Py_AddPendingCallEx(PyInterpreterState*)`.

> The function uses
> PyGILState_GetThisThreadState() which may return a Python thread state
> of the wrong interpreter :-( Again, the PyGILState API should be fixed
> to support subinterpreters.

Similarly, the only possible fix is to add a per-interpreter GILState
API (with a per-interpreter TLS variable under the hood).
The caller knows which interpreter context it wants to run Python code
in, so just let it pass that information to the GILState API.

The most annoying part is what to do with the legacy GILState API.

Regards

Antoine.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QGKPIDP2AWODA7KBDJBVP5TIEFOGXTRD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread musbur
On Fri, 20 Mar 2020 20:49:12 -
"Dennis Sweeney"  wrote:

> exactly same way (as a character set) in each case. Looking at how
> the argument is used, I'd argue that ``lstrip``/``rstrip``/``strip``
> are much more similar to each other than they are to the proposed
> methods

Correct, but I don't like the word "cut" because it suggests that
something is cut into pieces which can be used later separately.

I'd propose to use "trim" instead of "cut" because it makes clear that
something is cut off and discarded, and it is clearly different from
"strip".
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZPL56AAEOQ7P7V4LMMFPMWXFPNJ6HFAR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Ned Batchelder

On 3/20/20 9:34 PM, Cameron Simpson wrote:

On 20Mar2020 13:57, Eric Fahlgren  wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney 


wrote:


If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
been removed.  If ``s`` does not have ``pre`` as a prefix, an
unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.



The second sentence above unambiguously states that cutprefix returns 
'an
unchanged *copy*', but the example contradicts that and shows that 
'self'

may be returned and not a copy.  I think it should be reworded to
explicitly allow the optimization of returning self.


My versions of these (plain old functions) return self if unchanged, 
and are explicitly documented as doing so.


This has the concrete advantage that one can test for nonremoval if 
the suffix with "is", which is very fast, instead of == which may not be.


So one writes (assuming methods):

   prefix = cutsuffix(s, 'abc')
   if prefix is s:
   ... no change
   else:
   ... definitely changed, s != prefix also

I am explicitly in favour of returning self if unchanged.


Why be so prescriptive? The semantics of these functions should be about 
what the resulting string contains.  Leave it to implementors to decide 
when it is OK to return self or not.


--Ned.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/G3VIIQIGC4LTCEEV43S44RVDCGJ7AV42/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Eric V. Smith

On 3/21/2020 11:20 AM, Ned Batchelder wrote:

On 3/20/20 9:34 PM, Cameron Simpson wrote:

On 20Mar2020 13:57, Eric Fahlgren  wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney 


wrote:


If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
been removed.  If ``s`` does not have ``pre`` as a prefix, an
unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.



The second sentence above unambiguously states that cutprefix 
returns 'an
unchanged *copy*', but the example contradicts that and shows that 
'self'

may be returned and not a copy.  I think it should be reworded to
explicitly allow the optimization of returning self.


My versions of these (plain old functions) return self if unchanged, 
and are explicitly documented as doing so.


This has the concrete advantage that one can test for nonremoval if 
the suffix with "is", which is very fast, instead of == which may not 
be.


So one writes (assuming methods):

   prefix = cutsuffix(s, 'abc')
   if prefix is s:
   ... no change
   else:
   ... definitely changed, s != prefix also

I am explicitly in favour of returning self if unchanged.


Why be so prescriptive? The semantics of these functions should be 
about what the resulting string contains.  Leave it to implementors to 
decide when it is OK to return self or not.


The only reason I can think of is to enable the test above: did a 
suffix/prefix removal take place? That seems like a useful thing. I 
think if we don't specify the behavior one way or the other, people are 
going to rely on Cpython's behavior here, consciously or not.


Is there some python implementation that would have a problem with the 
"is" test, if we were being this prescriptive? Honest question.


Of course this would open the question of what to do if the suffix is 
the empty string. But since "'foo'.startswith('')" is True, maybe we'd 
have to return a copy in that case. It would be odd to have 
"s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, 
since there's already talk in the PEP about what happens if the 
prefix/suffix is the empty string, and if we adopt the "is" behavior 
we'd add more details there. Like "if the result is the same object as 
self, it means either the suffix is the empty string, or self didn't 
start with the suffix".


Eric

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HYSZSIAZA7QNPR43OCBXPWPAZKYTXB7L/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Victor Stinner
Well, if CPython is modified to implement tagged pointers and supports
storing a short strings (a few latin1 characters) as a pointer, it may
become harder to keep the same behavior for "x is y" where x and y are
strings.

Victor

Le sam. 21 mars 2020 à 17:23, Eric V. Smith  a écrit :
>
> On 3/21/2020 11:20 AM, Ned Batchelder wrote:
> > On 3/20/20 9:34 PM, Cameron Simpson wrote:
> >> On 20Mar2020 13:57, Eric Fahlgren  wrote:
> >>> On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney
> >>> 
> >>> wrote:
> >>>
>  If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
>  ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
>  been removed.  If ``s`` does not have ``pre`` as a prefix, an
>  unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
>  is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
> 
> >>>
> >>> The second sentence above unambiguously states that cutprefix
> >>> returns 'an
> >>> unchanged *copy*', but the example contradicts that and shows that
> >>> 'self'
> >>> may be returned and not a copy.  I think it should be reworded to
> >>> explicitly allow the optimization of returning self.
> >>
> >> My versions of these (plain old functions) return self if unchanged,
> >> and are explicitly documented as doing so.
> >>
> >> This has the concrete advantage that one can test for nonremoval if
> >> the suffix with "is", which is very fast, instead of == which may not
> >> be.
> >>
> >> So one writes (assuming methods):
> >>
> >>prefix = cutsuffix(s, 'abc')
> >>if prefix is s:
> >>... no change
> >>else:
> >>... definitely changed, s != prefix also
> >>
> >> I am explicitly in favour of returning self if unchanged.
> >>
> >>
> > Why be so prescriptive? The semantics of these functions should be
> > about what the resulting string contains.  Leave it to implementors to
> > decide when it is OK to return self or not.
>
> The only reason I can think of is to enable the test above: did a
> suffix/prefix removal take place? That seems like a useful thing. I
> think if we don't specify the behavior one way or the other, people are
> going to rely on Cpython's behavior here, consciously or not.
>
> Is there some python implementation that would have a problem with the
> "is" test, if we were being this prescriptive? Honest question.
>
> Of course this would open the question of what to do if the suffix is
> the empty string. But since "'foo'.startswith('')" is True, maybe we'd
> have to return a copy in that case. It would be odd to have
> "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or,
> since there's already talk in the PEP about what happens if the
> prefix/suffix is the empty string, and if we adopt the "is" behavior
> we'd add more details there. Like "if the result is the same object as
> self, it means either the suffix is the empty string, or self didn't
> start with the suffix".
>
> Eric
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/HYSZSIAZA7QNPR43OCBXPWPAZKYTXB7L/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XA4T4ORWQE7LHZYS3WDGYDYNEEZXJMZN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Eric V. Smith

On 3/21/2020 12:39 PM, Victor Stinner wrote:

Well, if CPython is modified to implement tagged pointers and supports
storing a short strings (a few latin1 characters) as a pointer, it may
become harder to keep the same behavior for "x is y" where x and y are
strings.


Good point. And I guess it's still a problem for interned strings, since 
even a copy could be the same object:


>>> s = 'for'
>>> s[:] is 'for'
True

So I now agree with Ned, we shouldn't be prescriptive here, and we 
should explicitly say in the PEP that there's no way to tell if the 
strip/cut/whatever took place, other than comparing via equality, not 
identity.


Eric



Victor

Le sam. 21 mars 2020 à 17:23, Eric V. Smith  a écrit :

On 3/21/2020 11:20 AM, Ned Batchelder wrote:

On 3/20/20 9:34 PM, Cameron Simpson wrote:

On 20Mar2020 13:57, Eric Fahlgren  wrote:

On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney

wrote:


If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
been removed.  If ``s`` does not have ``pre`` as a prefix, an
unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.


The second sentence above unambiguously states that cutprefix
returns 'an
unchanged *copy*', but the example contradicts that and shows that
'self'
may be returned and not a copy.  I think it should be reworded to
explicitly allow the optimization of returning self.

My versions of these (plain old functions) return self if unchanged,
and are explicitly documented as doing so.

This has the concrete advantage that one can test for nonremoval if
the suffix with "is", which is very fast, instead of == which may not
be.

So one writes (assuming methods):

prefix = cutsuffix(s, 'abc')
if prefix is s:
... no change
else:
... definitely changed, s != prefix also

I am explicitly in favour of returning self if unchanged.



Why be so prescriptive? The semantics of these functions should be
about what the resulting string contains.  Leave it to implementors to
decide when it is OK to return self or not.

The only reason I can think of is to enable the test above: did a
suffix/prefix removal take place? That seems like a useful thing. I
think if we don't specify the behavior one way or the other, people are
going to rely on Cpython's behavior here, consciously or not.

Is there some python implementation that would have a problem with the
"is" test, if we were being this prescriptive? Honest question.

Of course this would open the question of what to do if the suffix is
the empty string. But since "'foo'.startswith('')" is True, maybe we'd
have to return a copy in that case. It would be odd to have
"s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or,
since there's already talk in the PEP about what happens if the
prefix/suffix is the empty string, and if we adopt the "is" behavior
we'd add more details there. Like "if the result is the same object as
self, it means either the suffix is the empty string, or self didn't
start with the suffix".

Eric

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HYSZSIAZA7QNPR43OCBXPWPAZKYTXB7L/
Code of Conduct: http://python.org/psf/codeofconduct/




___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WDDY4Z6B775ZUKBXDJT57Q5REY2ZGSXN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Victor Stinner
In that case, the PEP should advice to use .startwith() or .endswith()
explicitly if the caller requires to know if the string is going to be
modified. Example:

modified = False
# O(n) complexity where n=len("prefix:")
if line.startswith("prefix:"):
line = line.cutprefix("prefix: ")
modified = True

It should be more efficient than:

old_line = line
line = line.cutprefix("prefix: ")
modified = (line != old_line)  # O(n) complexity where n=len(line)

since the checked prefix is usually way shorter than the whole string.

Victor

Le sam. 21 mars 2020 à 17:45, Eric V. Smith  a écrit :
>
> On 3/21/2020 12:39 PM, Victor Stinner wrote:
> > Well, if CPython is modified to implement tagged pointers and supports
> > storing a short strings (a few latin1 characters) as a pointer, it may
> > become harder to keep the same behavior for "x is y" where x and y are
> > strings.
>
> Good point. And I guess it's still a problem for interned strings, since
> even a copy could be the same object:
>
>  >>> s = 'for'
>  >>> s[:] is 'for'
> True
>
> So I now agree with Ned, we shouldn't be prescriptive here, and we
> should explicitly say in the PEP that there's no way to tell if the
> strip/cut/whatever took place, other than comparing via equality, not
> identity.
>
> Eric
>
> >
> > Victor
> >
> > Le sam. 21 mars 2020 à 17:23, Eric V. Smith  a écrit :
> >> On 3/21/2020 11:20 AM, Ned Batchelder wrote:
> >>> On 3/20/20 9:34 PM, Cameron Simpson wrote:
>  On 20Mar2020 13:57, Eric Fahlgren  wrote:
> > On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney
> > 
> > wrote:
> >
> >> If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
> >> ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
> >> been removed.  If ``s`` does not have ``pre`` as a prefix, an
> >> unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
> >> is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
> >>
> > The second sentence above unambiguously states that cutprefix
> > returns 'an
> > unchanged *copy*', but the example contradicts that and shows that
> > 'self'
> > may be returned and not a copy.  I think it should be reworded to
> > explicitly allow the optimization of returning self.
>  My versions of these (plain old functions) return self if unchanged,
>  and are explicitly documented as doing so.
> 
>  This has the concrete advantage that one can test for nonremoval if
>  the suffix with "is", which is very fast, instead of == which may not
>  be.
> 
>  So one writes (assuming methods):
> 
>  prefix = cutsuffix(s, 'abc')
>  if prefix is s:
>  ... no change
>  else:
>  ... definitely changed, s != prefix also
> 
>  I am explicitly in favour of returning self if unchanged.
> 
> 
> >>> Why be so prescriptive? The semantics of these functions should be
> >>> about what the resulting string contains.  Leave it to implementors to
> >>> decide when it is OK to return self or not.
> >> The only reason I can think of is to enable the test above: did a
> >> suffix/prefix removal take place? That seems like a useful thing. I
> >> think if we don't specify the behavior one way or the other, people are
> >> going to rely on Cpython's behavior here, consciously or not.
> >>
> >> Is there some python implementation that would have a problem with the
> >> "is" test, if we were being this prescriptive? Honest question.
> >>
> >> Of course this would open the question of what to do if the suffix is
> >> the empty string. But since "'foo'.startswith('')" is True, maybe we'd
> >> have to return a copy in that case. It would be odd to have
> >> "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or,
> >> since there's already talk in the PEP about what happens if the
> >> prefix/suffix is the empty string, and if we adopt the "is" behavior
> >> we'd add more details there. Like "if the result is the same object as
> >> self, it means either the suffix is the empty string, or self didn't
> >> start with the suffix".
> >>
> >> Eric
> >>
> >> ___
> >> Python-Dev mailing list -- python-dev@python.org
> >> To unsubscribe send an email to python-dev-le...@python.org
> >> https://mail.python.org/mailman3/lists/python-dev.python.org/
> >> Message archived at 
> >> https://mail.python.org/archives/list/python-dev@python.org/message/HYSZSIAZA7QNPR43OCBXPWPAZKYTXB7L/
> >> Code of Conduct: http://python.org/psf/codeofconduct/
> >
> >



-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/pyth

[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Eric V. Smith

On 3/21/2020 12:50 PM, Victor Stinner wrote:

In that case, the PEP should advice to use .startwith() or .endswith()
explicitly if the caller requires to know if the string is going to be
modified. Example:

modified = False
# O(n) complexity where n=len("prefix:")
if line.startswith("prefix:"):
 line = line.cutprefix("prefix: ")
 modified = True

It should be more efficient than:

old_line = line
line = line.cutprefix("prefix: ")
modified = (line != old_line)  # O(n) complexity where n=len(line)

since the checked prefix is usually way shorter than the whole string.


Agreed (except the string passed to startswith should be the same as the 
one used in cutprefix!).


Eric



Victor

Le sam. 21 mars 2020 à 17:45, Eric V. Smith  a écrit :

On 3/21/2020 12:39 PM, Victor Stinner wrote:

Well, if CPython is modified to implement tagged pointers and supports
storing a short strings (a few latin1 characters) as a pointer, it may
become harder to keep the same behavior for "x is y" where x and y are
strings.

Good point. And I guess it's still a problem for interned strings, since
even a copy could be the same object:

  >>> s = 'for'
  >>> s[:] is 'for'
True

So I now agree with Ned, we shouldn't be prescriptive here, and we
should explicitly say in the PEP that there's no way to tell if the
strip/cut/whatever took place, other than comparing via equality, not
identity.

Eric


Victor

Le sam. 21 mars 2020 à 17:23, Eric V. Smith  a écrit :

On 3/21/2020 11:20 AM, Ned Batchelder wrote:

On 3/20/20 9:34 PM, Cameron Simpson wrote:

On 20Mar2020 13:57, Eric Fahlgren  wrote:

On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney

wrote:


If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
been removed.  If ``s`` does not have ``pre`` as a prefix, an
unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.


The second sentence above unambiguously states that cutprefix
returns 'an
unchanged *copy*', but the example contradicts that and shows that
'self'
may be returned and not a copy.  I think it should be reworded to
explicitly allow the optimization of returning self.

My versions of these (plain old functions) return self if unchanged,
and are explicitly documented as doing so.

This has the concrete advantage that one can test for nonremoval if
the suffix with "is", which is very fast, instead of == which may not
be.

So one writes (assuming methods):

 prefix = cutsuffix(s, 'abc')
 if prefix is s:
 ... no change
 else:
 ... definitely changed, s != prefix also

I am explicitly in favour of returning self if unchanged.



Why be so prescriptive? The semantics of these functions should be
about what the resulting string contains.  Leave it to implementors to
decide when it is OK to return self or not.

The only reason I can think of is to enable the test above: did a
suffix/prefix removal take place? That seems like a useful thing. I
think if we don't specify the behavior one way or the other, people are
going to rely on Cpython's behavior here, consciously or not.

Is there some python implementation that would have a problem with the
"is" test, if we were being this prescriptive? Honest question.

Of course this would open the question of what to do if the suffix is
the empty string. But since "'foo'.startswith('')" is True, maybe we'd
have to return a copy in that case. It would be odd to have
"s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or,
since there's already talk in the PEP about what happens if the
prefix/suffix is the empty string, and if we adopt the "is" behavior
we'd add more details there. Like "if the result is the same object as
self, it means either the suffix is the empty string, or self didn't
start with the suffix".

Eric

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HYSZSIAZA7QNPR43OCBXPWPAZKYTXB7L/
Code of Conduct: http://python.org/psf/codeofconduct/






___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/APEHOJXRWSC6LU2A7FPFNLO352WMHAGU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing layout of f_localsplus in frame objects

2020-03-21 Thread Skip Montanaro
> So far, my second go-round is proceeding better (fingers crossed). I
> have added a new slot to the _frame struct (f_cellvars), initialized
> once at creation, then referenced elsewhere. I'm also rerunning the
> test suite more frequently. Once I've tweaked everything, all that
> will remain (in theory) to effect my change is to tweak the
> initialization of f_valuestack and f_cellvars. (If an extra slot is
> determined to be too space-expensive, a macro or inline function would
> suffice.)

Found and fixed the last holdout (a SETLOCAL call in ceval.c). For the
curious, I've pushed the change to my fork:

https://github.com/smontanaro/cpython/commit/318f16ff76e91e665b779e3b478a4406d0a9c0ec

As I expected, almost all the changes were in frameobject.c. The other
changes were mostly just to remove no longer needed pointer
arithmetic.

Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/US7OA44U7I6VTOZHUPCAKSXENLP4KJD3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Steven D'Aprano
On Sat, Mar 21, 2020 at 12:15:21PM -0400, Eric V. Smith wrote:
> On 3/21/2020 11:20 AM, Ned Batchelder wrote:

> >Why be so prescriptive? The semantics of these functions should be 
> >about what the resulting string contains.  Leave it to implementors to 
> >decide when it is OK to return self or not.

I agree with Ned -- whether the string object is returned unchanged or a 
copy is an implementation decision, not a language decision.


[Eric]
> The only reason I can think of is to enable the test above: did a 
> suffix/prefix removal take place? That seems like a useful thing.

We don't make this guarantee about string identity for any other string 
method, and CPython's behaviour varies from method to method:

py> s = 'a b c'
py> s is s.strip()
True
py> s is s.lower()
False

and version to version:

py> s is s.replace('a', 'a')  # 2.7
False
py> s is s.replace('a', 'a')  # 3.5
True

I've never seen anyone relying on this behaviour, and I don't expect 
these new methods will change that. Thinking that `is` is another way of 
writing `==`, yes, I see that frequently. But relying on object identity 
to see whether a new string was created by a method, no.

If you want to know whether a prefix/suffix was removed, there's a more 
reliable way than identity and a cheaper way than O(N) equality. Just 
compare the length of the string before and after. If the lengths are 
the same, nothing was removed.


-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ATVEUSROY3BSUK5BDPPS5A75TRCRR4TD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Steven D'Aprano
On Fri, Mar 20, 2020 at 06:18:20PM -0700, Nathaniel Smith wrote:
> On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
>  wrote:
> > This is a proposal to add two new methods, ``cutprefix`` and
> > ``cutsuffix``, to the APIs of Python's various string objects.
> 
> The names should use "start" and "end" instead of "prefix" and
> "suffix", to reduce the jargon factor 

Prefix and suffix aren't jargon. They teach those words to kids in 
primary school.

Why the concern over "jargon"? We happily talk about exception, 
metaclass, thread, process, CPU, gigabyte, async, ethernet, socket, 
hexadecimal, iterator, class, instance, HTTP, boolean, etc without 
blinking, but you're shying at prefix and suffix?



-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TJTOR4IZ6ESCSCGCV2WFG4ABSM7HZ2F4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Eric V. Smith

On 3/21/2020 2:09 PM, Steven D'Aprano wrote:

On Sat, Mar 21, 2020 at 12:15:21PM -0400, Eric V. Smith wrote:

On 3/21/2020 11:20 AM, Ned Batchelder wrote:

Why be so prescriptive? The semantics of these functions should be
about what the resulting string contains.  Leave it to implementors to
decide when it is OK to return self or not.

I agree with Ned -- whether the string object is returned unchanged or a
copy is an implementation decision, not a language decision.


[Eric]

The only reason I can think of is to enable the test above: did a
suffix/prefix removal take place? That seems like a useful thing.

We don't make this guarantee about string identity for any other string
method, and CPython's behaviour varies from method to method:

 py> s = 'a b c'
 py> s is s.strip()
 True
 py> s is s.lower()
 False

and version to version:

 py> s is s.replace('a', 'a')  # 2.7
 False
 py> s is s.replace('a', 'a')  # 3.5
 True

I've never seen anyone relying on this behaviour, and I don't expect
these new methods will change that. Thinking that `is` is another way of
writing `==`, yes, I see that frequently. But relying on object identity
to see whether a new string was created by a method, no.
Agreed. I think the PEP should just write the Python pseudo-code without 
copying, and it should mention that whether or not the original string 
is returned is an implementation detail. Then I think the actual 
documentation should just omit any discussion of it, like the existing 
docs for lstrip().

If you want to know whether a prefix/suffix was removed, there's a more
reliable way than identity and a cheaper way than O(N) equality. Just
compare the length of the string before and after. If the lengths are
the same, nothing was removed.


That's a good point. This should probably go in the PEP, and maybe the 
documentation.


Eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SOLOVVZXIGEAFZKX223YEAWLP6G5TSB7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Rob Cliffe via Python-Dev



On 21/03/2020 16:15, Eric V. Smith wrote:

On 3/21/2020 11:20 AM, Ned Batchelder wrote:

On 3/20/20 9:34 PM, Cameron Simpson wrote:

On 20Mar2020 13:57, Eric Fahlgren  wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney 


wrote:

If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, 
then

``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
been removed.  If ``s`` does not have ``pre`` as a prefix, an
unchanged copy of ``s`` is returned.  In summary, 
``s.cutprefix(pre)``
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else 
s``.




The second sentence above unambiguously states that cutprefix 
returns 'an
unchanged *copy*', but the example contradicts that and shows that 
'self'

may be returned and not a copy.  I think it should be reworded to
explicitly allow the optimization of returning self.


My versions of these (plain old functions) return self if unchanged, 
and are explicitly documented as doing so.


This has the concrete advantage that one can test for nonremoval if 
the suffix with "is", which is very fast, instead of == which may 
not be.


So one writes (assuming methods):

   prefix = cutsuffix(s, 'abc')
   if prefix is s:
   ... no change
   else:
   ... definitely changed, s != prefix also

I am explicitly in favour of returning self if unchanged.


Why be so prescriptive? The semantics of these functions should be 
about what the resulting string contains.  Leave it to implementors 
to decide when it is OK to return self or not.


The only reason I can think of is to enable the test above: did a 
suffix/prefix removal take place? That seems like a useful thing. I 
think if we don't specify the behavior one way or the other, people 
are going to rely on Cpython's behavior here, consciously or not.


Is there some python implementation that would have a problem with the 
"is" test, if we were being this prescriptive? Honest question.


Of course this would open the question of what to do if the suffix is 
the empty string. But since "'foo'.startswith('')" is True, maybe we'd 
have to return a copy in that case. It would be odd to have 
"s.startswith('')" be true, but "s.cutprefix('') is s" also be True. 
Or, since there's already talk in the PEP about what happens if the 
prefix/suffix is the empty string, and if we adopt the "is" behavior 
we'd add more details there. Like "if the result is the same object as 
self, it means either the suffix is the empty string, or self didn't 
start with the suffix".


Eric

*If* no python implementation would have a problem with the "is" test 
(and from a position of total ignorance I would guess that this is the 
case :-)), then it would be a useful feature and it is easier to define 
it now than try to force conformance later.  I have no problem with 
's.startswith("") == True and s.cutprefix("") is s'.  YMMV.

Rob Cliffe
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/R374V42FBOHAKJAXNSF2DQWF3HK763KS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Walter Dörwald

On 21 Mar 2020, at 19:09, Steven D'Aprano wrote:


On Sat, Mar 21, 2020 at 12:15:21PM -0400, Eric V. Smith wrote:

On 3/21/2020 11:20 AM, Ned Batchelder wrote:



Why be so prescriptive? The semantics of these functions should be
about what the resulting string contains.  Leave it to implementors 
to

decide when it is OK to return self or not.


I agree with Ned -- whether the string object is returned unchanged or 
a

copy is an implementation decision, not a language decision.


[Eric]

The only reason I can think of is to enable the test above: did a
suffix/prefix removal take place? That seems like a useful thing.


We don't make this guarantee about string identity for any other 
string

method, and CPython's behaviour varies from method to method:

py> s = 'a b c'
py> s is s.strip()
True
py> s is s.lower()
False

and version to version:

py> s is s.replace('a', 'a')  # 2.7
False
py> s is s.replace('a', 'a')  # 3.5
True


And it is different for string subclasses, because the method always 
returns an instance of the baseclass:



class str2(str):

...pass
...

isinstance(str2('a b c').strip(), str2)

False

isinstance(str2('a b c').strip(), str2)

False

Servus,
   Walter
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JNIVR6IZAG7GEDREHCEHD25KANJDTR3C/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Chris Angelico
On Sun, Mar 22, 2020 at 5:41 AM Steven D'Aprano  wrote:
>
> On Fri, Mar 20, 2020 at 06:18:20PM -0700, Nathaniel Smith wrote:
> > On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
> >  wrote:
> > > This is a proposal to add two new methods, ``cutprefix`` and
> > > ``cutsuffix``, to the APIs of Python's various string objects.
> >
> > The names should use "start" and "end" instead of "prefix" and
> > "suffix", to reduce the jargon factor
>
> Prefix and suffix aren't jargon. They teach those words to kids in
> primary school.
>
> Why the concern over "jargon"? We happily talk about exception,
> metaclass, thread, process, CPU, gigabyte, async, ethernet, socket,
> hexadecimal, iterator, class, instance, HTTP, boolean, etc without
> blinking, but you're shying at prefix and suffix?
>

As a general rule, jargon from your OWN domain is easier to justify
than jargon from some OTHER domain. (Though in this case, I agree that
"prefix" and "suffix" shouldn't be a problem.)

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BSCTUBWDBFHOJQJKBX2HFTDSWHI6VGCV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Ned Batchelder

On 3/21/20 12:51 PM, Rob Cliffe via Python-Dev wrote:



On 21/03/2020 16:15, Eric V. Smith wrote:

On 3/21/2020 11:20 AM, Ned Batchelder wrote:

On 3/20/20 9:34 PM, Cameron Simpson wrote:

On 20Mar2020 13:57, Eric Fahlgren  wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney 


wrote:

If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, 
then
``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix 
has

been removed.  If ``s`` does not have ``pre`` as a prefix, an
unchanged copy of ``s`` is returned.  In summary, 
``s.cutprefix(pre)``
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else 
s``.




The second sentence above unambiguously states that cutprefix 
returns 'an
unchanged *copy*', but the example contradicts that and shows that 
'self'

may be returned and not a copy.  I think it should be reworded to
explicitly allow the optimization of returning self.


My versions of these (plain old functions) return self if 
unchanged, and are explicitly documented as doing so.


This has the concrete advantage that one can test for nonremoval if 
the suffix with "is", which is very fast, instead of == which may 
not be.


So one writes (assuming methods):

   prefix = cutsuffix(s, 'abc')
   if prefix is s:
   ... no change
   else:
   ... definitely changed, s != prefix also

I am explicitly in favour of returning self if unchanged.


Why be so prescriptive? The semantics of these functions should be 
about what the resulting string contains.  Leave it to implementors 
to decide when it is OK to return self or not.


The only reason I can think of is to enable the test above: did a 
suffix/prefix removal take place? That seems like a useful thing. I 
think if we don't specify the behavior one way or the other, people 
are going to rely on Cpython's behavior here, consciously or not.


Is there some python implementation that would have a problem with 
the "is" test, if we were being this prescriptive? Honest question.


Of course this would open the question of what to do if the suffix is 
the empty string. But since "'foo'.startswith('')" is True, maybe 
we'd have to return a copy in that case. It would be odd to have 
"s.startswith('')" be true, but "s.cutprefix('') is s" also be True. 
Or, since there's already talk in the PEP about what happens if the 
prefix/suffix is the empty string, and if we adopt the "is" behavior 
we'd add more details there. Like "if the result is the same object 
as self, it means either the suffix is the empty string, or self 
didn't start with the suffix".


Eric

*If* no python implementation would have a problem with the "is" test 
(and from a position of total ignorance I would guess that this is the 
case :-)), then it would be a useful feature and it is easier to 
define it now than try to force conformance later.  I have no problem 
with 's.startswith("") == True and s.cutprefix("") is s'.  YMMV.


Why take on that "*If*" conditional?  We're constantly telling people 
not to compare strings with "is".  So why define how "is" will behave in 
this PEP?  It's the implementation's decision whether to return a new 
immutable object with the same value, or the same object.


As Steven points out elsewhere in this thread, Python's builtins' 
behavior differ, across methods and versions, in this regard.  I 
certainly didn't know that, and it was probably news to you as well.  So 
why do we need to nail it down for suffixes and prefixes?


There will be no conformance to force later, because if the value 
doesn't change, then it doesn't matter whether it's a new string or the 
same string.


--Ned.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/S3ILKW2HBJWFO4FCHEXITTP445URVHEG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Dennis Sweeney
Hi Victor. I accidentally created a new thread, but I intended everything below 
as a response:

Thanks for the review!

> In short, I propose:
>  def cutprefix(self: str, prefix: str, /) -> str:
>  if self.startswith(prefix) and prefix:
>  return self[len(prefix):]
>  else:
>  return self
> I call startswith() before testing if pre is non-empty to inherit of
> startswith() input type validation. For example, "a".startswith(b'x')
> raises a TypeError.

This still erroneously accepts tuples and and would return return str
subclasses unchanged. If we want to make the Python be the spec with accuracy 
about
type-checking, then perhaps we want:

 def cutprefix(self: str, prefix: str, /) -> str:
 if not isinstance(prefix, str):
 raise TypeError(f'cutprefix() argument must be str, '
 f'not {type(prefix).__qualname__}')
 self = str(self)
 prefix = str(prefix)
 if self.startswith(prefix):
 return self[len(prefix):]
 else:
 return self

For accepting multiple prefixes, I can't tell if there's a consensus about 
whether
``s = s.cutprefix("a", "b", "c")`` should be the same as

 for prefix in ["a", "b", "c"]:
 s = s.cutprefix(prefix)

or

 for prefix in ["a", "b", "c"]:
 if s.startwith(prefix):
 s = s.cutprefix(prefix)
 break

The latter seems to be harder for users to implement through other means, and 
it's the
behavior that test_concurrent_futures.py has implemented now, so maybe that's 
what we
want. Also, it seems more elegant to me to accept variadic arguments, rather 
than a single
tuple of arguments. Is it worth it to match the related-but-not-the-same API of
"startswith" if it makes for uglier Python? My gut reaction is to prefer the 
varargs, but
maybe someone has a different perspective.

I can submit a revision to the PEP with some changes soon.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NYVDSQ7XB3KOXREY5FUALEILB2UCUVD3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Dennis Sweeney
Even then, it seems that prefix is an established computer science term:

[1] https://en.wikipedia.org/wiki/Substring#Prefix
[2] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990). 
Introduction to Algorithms (1st ed.). Chapter 15.4: Longest common subsequence

And a quick search reveals that it's used hundreds of times in the docs: 
https://docs.python.org/3/search.html?q=prefix
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/R7CC6LEZHVLTILXGYFYGVXYTDANVJFNF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Nick Coghlan
On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith,  wrote:

> On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
>  wrote:
> > This is a proposal to add two new methods, ``cutprefix`` and
> > ``cutsuffix``, to the APIs of Python's various string objects.
>
> The names should use "start" and "end" instead of "prefix" and
> "suffix", to reduce the jargon factor and for consistency with
> startswith/endswith.
>

This would also be more consistent with startswith() & endswith(). (For
folks querying this: the relevant domain here is "str builtin method
names", and we already use startswith/endswith there, not
hasprefix/hassuffix. The most challenging relevant audience for new str
builtin method *names* is also 10 year olds learning to program in school,
not adults reading the documentation)

I think the concern about stripstart() & stripend() working with
substrings, while strip/lstrip/rstrip work with character sets, is valid,
but I also share the concern about introducing "cut" as yet another verb to
learn in the already wide string API.

The example where the new function was used instead of a questionable use
of replace gave me an idea, though: what if the new functions were
"replacestart()" and "replaceend()"?

* uses "start" and "with" for consistency with the existing checks
* substring based, like the "replace" method
* can be combined with an extension of "replace()" to also accept a tuple
of old values to match and replace to allow for consistency with checking
for multiple prefixes or suffixes.

We'd expect the most common case to be the empty string, but I think the
meaning of the following is clear, and consistent with the current practice
of using replace() to delete text from anywhere within the string:

s = s.replacestart('context.' , '')

This approach would also very cleanly handle the last example from the PEP:

s = s.replaceend(('Mixin', 'Tests', 'Test'), '')

The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I
think keeping consistency with other str method names would be preferable
to adding an underscore to the name.

Interestingly, you could also use this to match multiple prefixes or
suffixes and find out *which one* matched (since the existing methods don't
report that):

s2 = s.replaceend(suffixes, '')
suffix_len = len(s) - len(s2)
suffix = s[-suffix-len:] if suffix_len else None

Cheers,
Nick.


>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VQULYFFT4VVXV35RE5ETR5MOZSHLPFTV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Nathaniel Smith
On Sat, Mar 21, 2020 at 11:35 AM Steven D'Aprano  wrote:
>
> On Fri, Mar 20, 2020 at 06:18:20PM -0700, Nathaniel Smith wrote:
> > On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
> >  wrote:
> > > This is a proposal to add two new methods, ``cutprefix`` and
> > > ``cutsuffix``, to the APIs of Python's various string objects.
> >
> > The names should use "start" and "end" instead of "prefix" and
> > "suffix", to reduce the jargon factor
>
> Prefix and suffix aren't jargon. They teach those words to kids in
> primary school.

Whereas they don't have to teach "start" and "end", because kids
already know them before they start school.

> Why the concern over "jargon"? We happily talk about exception,
> metaclass, thread, process, CPU, gigabyte, async, ethernet, socket,
> hexadecimal, iterator, class, instance, HTTP, boolean, etc without
> blinking, but you're shying at prefix and suffix?

Yeah. Jargon is fine when there's no regular word with appropriate
precision, but we shouldn't use jargon just for jargon's sake. Python
has a long tradition of preferring regular words when possible, e.g.
using not/and/or instead of !/&&/||, and startswith/endswith instead
of hasprefix/hassuffix.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SMZB6KII42ZSLOFJGDMFRXXPM72UGQ3D/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Kyle Stanley
Nick Coghlan wrote:
> The example where the new function was used instead of a questionable use
of replace gave me an idea, though: what if the new functions were
"replacestart()" and "replaceend()"?
>
> * uses "start" and "with" for consistency with the existing checks
> * substring based, like the "replace" method
> * can be combined with an extension of "replace()" to also accept a tuple
of old values to match and replace to allow for consistency with checking
for multiple prefixes or suffixes.

FWIW, I don't place as much value on being consistent with "startswith()"
and "endswith()". But with it being substring based, I think the term
"replace" actually makes a lot more sense here compared to "cut". +1


On Sat, Mar 21, 2020 at 9:46 PM Nick Coghlan  wrote:

> On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith,  wrote:
>
>> On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
>>  wrote:
>> > This is a proposal to add two new methods, ``cutprefix`` and
>> > ``cutsuffix``, to the APIs of Python's various string objects.
>>
>> The names should use "start" and "end" instead of "prefix" and
>> "suffix", to reduce the jargon factor and for consistency with
>> startswith/endswith.
>>
>
> This would also be more consistent with startswith() & endswith(). (For
> folks querying this: the relevant domain here is "str builtin method
> names", and we already use startswith/endswith there, not
> hasprefix/hassuffix. The most challenging relevant audience for new str
> builtin method *names* is also 10 year olds learning to program in school,
> not adults reading the documentation)
>
> I think the concern about stripstart() & stripend() working with
> substrings, while strip/lstrip/rstrip work with character sets, is valid,
> but I also share the concern about introducing "cut" as yet another verb to
> learn in the already wide string API.
>
> The example where the new function was used instead of a questionable use
> of replace gave me an idea, though: what if the new functions were
> "replacestart()" and "replaceend()"?
>
> * uses "start" and "with" for consistency with the existing checks
> * substring based, like the "replace" method
> * can be combined with an extension of "replace()" to also accept a tuple
> of old values to match and replace to allow for consistency with checking
> for multiple prefixes or suffixes.
>
> We'd expect the most common case to be the empty string, but I think the
> meaning of the following is clear, and consistent with the current practice
> of using replace() to delete text from anywhere within the string:
>
> s = s.replacestart('context.' , '')
>
> This approach would also very cleanly handle the last example from the PEP:
>
> s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
>
> The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I
> think keeping consistency with other str method names would be preferable
> to adding an underscore to the name.
>
> Interestingly, you could also use this to match multiple prefixes or
> suffixes and find out *which one* matched (since the existing methods don't
> report that):
>
> s2 = s.replaceend(suffixes, '')
> suffix_len = len(s) - len(s2)
> suffix = s[-suffix-len:] if suffix_len else None
>
> Cheers,
> Nick.
>
>
>> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/VQULYFFT4VVXV35RE5ETR5MOZSHLPFTV/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KFDWZ3LWUIE6KHYQYU6Z5VL3SXMMMZOM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Chris Angelico
On Sun, Mar 22, 2020 at 1:02 PM Nathaniel Smith  wrote:
>
> On Sat, Mar 21, 2020 at 11:35 AM Steven D'Aprano  wrote:
> >
> > On Fri, Mar 20, 2020 at 06:18:20PM -0700, Nathaniel Smith wrote:
> > > On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
> > >  wrote:
> > > > This is a proposal to add two new methods, ``cutprefix`` and
> > > > ``cutsuffix``, to the APIs of Python's various string objects.
> > >
> > > The names should use "start" and "end" instead of "prefix" and
> > > "suffix", to reduce the jargon factor
> >
> > Prefix and suffix aren't jargon. They teach those words to kids in
> > primary school.
>
> Whereas they don't have to teach "start" and "end", because kids
> already know them before they start school.
>
> > Why the concern over "jargon"? We happily talk about exception,
> > metaclass, thread, process, CPU, gigabyte, async, ethernet, socket,
> > hexadecimal, iterator, class, instance, HTTP, boolean, etc without
> > blinking, but you're shying at prefix and suffix?
>
> Yeah. Jargon is fine when there's no regular word with appropriate
> precision, but we shouldn't use jargon just for jargon's sake. Python
> has a long tradition of preferring regular words when possible, e.g.
> using not/and/or instead of !/&&/||, and startswith/endswith instead
> of hasprefix/hassuffix.
>

Given that the word "prefix" appears in help("".startswith), I don't
think there's really a lot to be gained by arguing this point :)
There's absolutely nothing wrong with the word.

But Dennis, welcome to the wonderful world of change proposals, where
you will experience insane amounts of pushback and debate on the
finest points of bikeshedding, whether or not people actually even
support the proposal at all...

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UFV5CPYKJSRGOAOIHVZMBSQ3HY6B5VDE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Rob Cliffe via Python-Dev



On 21/03/2020 20:16, Ned Batchelder wrote:

On 3/21/20 12:51 PM, Rob Cliffe via Python-Dev wrote:



On 21/03/2020 16:15, Eric V. Smith wrote:

On 3/21/2020 11:20 AM, Ned Batchelder wrote:

On 3/20/20 9:34 PM, Cameron Simpson wrote:

On 20Mar2020 13:57, Eric Fahlgren  wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney 


wrote:

If ``s`` is one these objects, and ``s`` has ``pre`` as a 
prefix, then
``s.cutprefix(pre)`` returns a copy of ``s`` in which that 
prefix has

been removed.  If ``s`` does not have ``pre`` as a prefix, an
unchanged copy of ``s`` is returned.  In summary, 
``s.cutprefix(pre)``
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) 
else s``.




The second sentence above unambiguously states that cutprefix 
returns 'an
unchanged *copy*', but the example contradicts that and shows 
that 'self'

may be returned and not a copy.  I think it should be reworded to
explicitly allow the optimization of returning self.


My versions of these (plain old functions) return self if 
unchanged, and are explicitly documented as doing so.


This has the concrete advantage that one can test for nonremoval 
if the suffix with "is", which is very fast, instead of == which 
may not be.


So one writes (assuming methods):

   prefix = cutsuffix(s, 'abc')
   if prefix is s:
   ... no change
   else:
   ... definitely changed, s != prefix also

I am explicitly in favour of returning self if unchanged.


Why be so prescriptive? The semantics of these functions should be 
about what the resulting string contains.  Leave it to implementors 
to decide when it is OK to return self or not.


The only reason I can think of is to enable the test above: did a 
suffix/prefix removal take place? That seems like a useful thing. I 
think if we don't specify the behavior one way or the other, people 
are going to rely on Cpython's behavior here, consciously or not.


Is there some python implementation that would have a problem with 
the "is" test, if we were being this prescriptive? Honest question.


Of course this would open the question of what to do if the suffix 
is the empty string. But since "'foo'.startswith('')" is True, maybe 
we'd have to return a copy in that case. It would be odd to have 
"s.startswith('')" be true, but "s.cutprefix('') is s" also be True. 
Or, since there's already talk in the PEP about what happens if the 
prefix/suffix is the empty string, and if we adopt the "is" behavior 
we'd add more details there. Like "if the result is the same object 
as self, it means either the suffix is the empty string, or self 
didn't start with the suffix".


Eric

*If* no python implementation would have a problem with the "is" test 
(and from a position of total ignorance I would guess that this is 
the case :-)), then it would be a useful feature and it is easier to 
define it now than try to force conformance later. I have no problem 
with 's.startswith("") == True and s.cutprefix("") is s'.  YMMV.


Why take on that "*If*" conditional?  We're constantly telling people 
not to compare strings with "is".  So why define how "is" will behave 
in this PEP?  It's the implementation's decision whether to return a 
new immutable object with the same value, or the same object.


As Steven points out elsewhere in this thread, Python's builtins' 
behavior differ, across methods and versions, in this regard.  I 
certainly didn't know that, and it was probably news to you as well.  
So why do we need to nail it down for suffixes and prefixes?


There will be no conformance to force later, because if the value 
doesn't change, then it doesn't matter whether it's a new string or 
the same string.


--Ned.

Conceded.
Rob Cliffe
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/I5VOEF3742I2QKTSKS2D4YA6IB6OR3GS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Dennis Sweeney
> But Dennis, welcome to the wonderful world of change proposals, where
> you will experience insane amounts of pushback and debate on the
> finest points of bikeshedding, whether or not people actually even
> support the proposal at all...

Lol -- thanks!

In my mind, another reason that I like including the words "prefix" and 
"suffix" over "start" and "end" is that, even though using the verb "end" in 
"endswith" is unambiguous, the noun "end" can be used as either the initial or 
final end, as in "remove this thing from both ends of the string. So "suffix" 
feels more precise to me.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7UHLOAR6NTVNLN3RBQP6ONHTLTDGXLQW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Guido van Rossum
On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan  wrote:

> On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith,  wrote:
>
>> On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
>>  wrote:
>> > This is a proposal to add two new methods, ``cutprefix`` and
>> > ``cutsuffix``, to the APIs of Python's various string objects.
>>
>> The names should use "start" and "end" instead of "prefix" and
>> "suffix", to reduce the jargon factor and for consistency with
>> startswith/endswith.
>>
>
> This would also be more consistent with startswith() & endswith(). (For
> folks querying this: the relevant domain here is "str builtin method
> names", and we already use startswith/endswith there, not
> hasprefix/hassuffix. The most challenging relevant audience for new str
> builtin method *names* is also 10 year olds learning to program in school,
> not adults reading the documentation)
>

To my language sense, hasprefix/hassuffix are horrible compared to
startswith/endswith. If you were to talk about this kind of condition using
English instead of Python, you wouldn't say "if x has prefix y", you'd say
"if x starts with y". (I doubt any programming language uses hasPrefix or
has_prefix for this, making it a strawman.)

*But*, what would you say if you wanted to express the idea or removing
something from the start or end? It's pretty verbose to say "remove y from
the end of x", and it's not easy to translate that into a method name.
x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which
confuses the reader.

The thing is that it's hard to translate "starts" (a verb) into a noun --
the "start" of something is its very beginning (i.e., in Python, position
zero), while a "prefix" is a noun that specifically describes an initial
substring (and I'm glad we don't have to use *that* :-).


> I think the concern about stripstart() & stripend() working with
> substrings, while strip/lstrip/rstrip work with character sets, is valid,
> but I also share the concern about introducing "cut" as yet another verb to
> learn in the already wide string API.
>

It's not great, and I actually think that "stripprefix" and "stripsuffix"
are reasonable. (I found that in Go, everything we call "strip" is called
"Trim", and there are "TrimPrefix" and "TrimSuffix" functions that
correspond to the PEP 616 functions.)


> The example where the new function was used instead of a questionable use
> of replace gave me an idea, though: what if the new functions were
> "replacestart()" and "replaceend()"?
>
> * uses "start" and "with" for consistency with the existing checks
> * substring based, like the "replace" method
> * can be combined with an extension of "replace()" to also accept a tuple
> of old values to match and replace to allow for consistency with checking
> for multiple prefixes or suffixes.
>
> We'd expect the most common case to be the empty string, but I think the
> meaning of the following is clear, and consistent with the current practice
> of using replace() to delete text from anywhere within the string:
>
> s = s.replacestart('context.' , '')
>

This feels like a hypergeneralization. In 99.9% of use cases we just need
to remove the prefix or suffix. If you want to replace the suffix with
something else, you can probably use string concatenation. (In the one use
case I can think of, changing "foo.c" into "foo.o", it would make sense
that plain "foo" ended up becoming "foo.o", so s.stripsuffix(".c") + ".o"
actually works better there.


> This approach would also very cleanly handle the last example from the PEP:
>
> s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
>

Maybe the proposed functions can optionally take a tuple of
prefixes/suffixes, like startswith/endswith do?


> The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I
> think keeping consistency with other str method names would be preferable
> to adding an underscore to the name.
>

Agreed on the second part, I just really don't like the 'ee'.


> Interestingly, you could also use this to match multiple prefixes or
> suffixes and find out *which one* matched (since the existing methods don't
> report that):
>
> s2 = s.replaceend(suffixes, '')
> suffix_len = len(s) - len(s2)
> suffix = s[-suffix-len:] if suffix_len else None
>
> Cheers,
> Nick.
>

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Q33NGX3N4JEI3ECUW3WBL33EX2JR3Y5C/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Dennis Sweeney
Is there a proven use case for anything other than the empty string as the 
replacement? I prefer your "replacewhatever" to another "stripwhatever" name, 
and I think it's clear and nicely fits the behavior you proposed. But should we 
allow a naming convenience to dictate that the behavior should be generalized 
to a use case we're not sure exists, where the same same argument is passed 99% 
of the time? I think a downside would be that a 
pass-a-string-or-a-tuple-of-strings interface would be more mental effort to 
keep track of than a ``*args`` variadic interface for 
"(cut/remove/without/trim)prefix", even if the former is how ``startswith()`` 
works.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KS25JX4V5LR3ZCV4EXU763RLTT24D4JT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Ivan Pozdeev via Python-Dev



On 20.03.2020 21:52, Dennis Sweeney wrote:

Browser Link: https://www.python.org/dev/peps/pep-0616/

PEP: 616
Title: String methods to remove prefixes and suffixes
Author: Dennis Sweeney 
Sponsor: Eric V. Smith 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 19-Mar-2020
Python-Version: 3.9
Post-History: 30-Aug-2002


Abstract


This is a proposal to add two new methods, ``cutprefix`` and
``cutsuffix``, to the APIs of Python's various string objects.  In
particular, the methods would be added to Unicode ``str`` objects,
binary ``bytes`` and ``bytearray`` objects, and
``collections.UserString``.


Does it need to be separate methods? Can we augment or even replace *strip() 
instead?

E.g.

*strip(chars: str, line: str) -> str

As written in the PEP preface, the very reason for the PEP is that people are continuously trying to use *strip methods for the suggested 
functionality -- which shows that this is where they are expecting to find it.


(as a bonus, we'll be saved from bikeshedding debates over the names)

---

Then, https://mail.python.org/archives/list/python-id...@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/ suggests that the use of strip 
with character set argument may have fallen out of favor since its adoption.


If that's the case, it can be deprecated in favor of the new use, thus saving 
us from extra complexity in perspective.



If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
been removed.  If ``s`` does not have ``pre`` as a prefix, an
unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.

The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is
roughly equivalent to
``s[:-len(suf)] if suf and s.endswith(suf) else s``.


Rationale
=

There have been repeated issues [#confusion]_ on the Bug Tracker
and StackOverflow related to user confusion about the existing
``str.lstrip`` and ``str.rstrip`` methods.  These users are typically
expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they
are surprised that the parameter for ``lstrip`` is interpreted as a
set of characters, not a substring.  This repeated issue is evidence
that these methods are useful, and the new methods allow a cleaner
redirection of users to the desired behavior.

As another testimonial for the usefulness of these methods, several
users on Python-Ideas [#pyid]_ reported frequently including similar
functions in their own code for productivity.  The implementation
often contained subtle mistakes regarding the handling of the empty
string (see `Specification`_).


Specification
=

The builtin ``str`` class will gain two new methods with roughly the
following behavior::

 def cutprefix(self: str, pre: str, /) -> str:
 if self.startswith(pre):
 return self[len(pre):]
 return self[:]
 
 def cutsuffix(self: str, suf: str, /) -> str:

 if suf and self.endswith(suf):
 return self[:-len(suf)]
 return self[:]

The only difference between the real implementation and the above is
that, as with other string methods like ``replace``, the
methods will raise a ``TypeError`` if any of ``self``, ``pre`` or
``suf`` is not an instace of ``str``, and will cast subclasses of
``str`` to builtin ``str`` objects.

Note that without the check for the truthyness of ``suf``,
``s.cutsuffix('')`` would be mishandled and always return the empty
string due to the unintended evaluation of ``self[:-0]``.

Methods with the corresponding semantics will be added to the builtin
``bytes`` and ``bytearray`` objects.  If ``b`` is either a ``bytes``
or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
will accept any bytes-like object as an argument.

Note that the ``bytearray`` methods return a copy of ``self``; they do
not operate in place.

The following behavior is considered a CPython implementation detail,
but is not guaranteed by this specification::

 >>> x = 'foobar' * 10**6
 >>> x.cutprefix('baz') is x is x.cutsuffix('baz')
 True
 >>> x.cutprefix('') is x is x.cutsuffix('')
 True

That is, for CPython's immutable ``str`` and ``bytes`` objects, the
methods return the original object when the affix is not found or if
the affix is empty.  Because these types test for equality using
shortcuts for identity and length, the following equivalent
expressions are evaluated at approximately the same speed, for any
``str`` objects (or ``bytes`` objects) ``x`` and ``y``::

 >>> (True, x[len(y):]) if x.startswith(y) else (False, x)
 >>> (True, z) if x != (z := x.cutprefix(y)) else (False, x)


The two methods will also be added to ``collections.UserString``,
where they rely on the implementation of the new ``str`` methods.


Motivating examples from the Python standard library
==

[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Ivan Pozdeev via Python-Dev

On 22.03.2020 6:38, Guido van Rossum wrote:

On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan mailto:ncogh...@gmail.com>> wrote:

On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith, mailto:n...@pobox.com>> wrote:

On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
mailto:sweeney.dennis...@gmail.com>> 
wrote:
> This is a proposal to add two new methods, ``cutprefix`` and
> ``cutsuffix``, to the APIs of Python's various string objects.

The names should use "start" and "end" instead of "prefix" and
"suffix", to reduce the jargon factor and for consistency with
startswith/endswith.


This would also be more consistent with startswith() & endswith(). (For folks 
querying this: the relevant domain here is "str builtin
method names", and we already use startswith/endswith there, not 
hasprefix/hassuffix. The most challenging relevant audience for new
str builtin method *names* is also 10 year olds learning to program in 
school, not adults reading the documentation)


To my language sense, hasprefix/hassuffix are horrible compared to startswith/endswith. If you were to talk about this kind of condition 
using English instead of Python, you wouldn't say "if x has prefix y", you'd say "if x starts with y". (I doubt any programming language 
uses hasPrefix or has_prefix for this, making it a strawman.)


*But*, what would you say if you wanted to express the idea or removing something from the start or end? It's pretty verbose to say 
"remove y from the end of x", and it's not easy to translate that into a method name. x.removefromend(y)? Blech! And x.removeend(y) has 
the double 'e', which confuses the reader.


The thing is that it's hard to translate "starts" (a verb) into a noun -- the "start" of something is its very beginning (i.e., in Python, 
position zero), while a "prefix" is a noun that specifically describes an initial substring (and I'm glad we don't have to use *that* :-).


I think the concern about stripstart() & stripend() working with 
substrings, while strip/lstrip/rstrip work with character sets, is
valid, but I also share the concern about introducing "cut" as yet another 
verb to learn in the already wide string API.


It's not great, and I actually think that "stripprefix" and "stripsuffix" are reasonable. (I found that in Go, everything we call "strip" 
is called "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that correspond to the PEP 616 functions.)


I must note that names conforming to https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would be "strip_prefix" and 
"strip_suffix".



The example where the new function was used instead of a questionable use 
of replace gave me an idea, though: what if the new
functions were "replacestart()" and "replaceend()"?

* uses "start" and "with" for consistency with the existing checks
* substring based, like the "replace" method
* can be combined with an extension of "replace()" to also accept a tuple 
of old values to match and replace to allow for consistency
with checking for multiple prefixes or suffixes.

We'd expect the most common case to be the empty string, but I think the 
meaning of the following is clear, and consistent with the
current practice of using replace() to delete text from anywhere within the 
string:

    s = s.replacestart('context.' , '')


This feels like a hypergeneralization. In 99.9% of use cases we just need to remove the prefix or suffix. If you want to replace the 
suffix with something else, you can probably use string concatenation. (In the one use case I can think of, changing "foo.c" into "foo.o", 
it would make sense that plain "foo" ended up becoming "foo.o", so s.stripsuffix(".c") + ".o" actually works better there.


This approach would also very cleanly handle the last example from the PEP:

    s = s.replaceend(('Mixin', 'Tests', 'Test'), '')


Maybe the proposed functions can optionally take a tuple of prefixes/suffixes, 
like startswith/endswith do?

The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I 
think keeping consistency with other str method names would be
preferable to adding an underscore to the name.


Agreed on the second part, I just really don't like the 'ee'.

Interestingly, you could also use this to match multiple prefixes or 
suffixes and find out *which one* matched (since the existing
methods don't report that):

    s2 = s.replaceend(suffixes, '')
    suffix_len = len(s) - len(s2)
    suffix = s[-suffix-len:] if suffix_len else None

Cheers,
Nick.


--
--Guido van Rossum (python.org/~guido )
/Pronouns: he/him //(why is my pronoun here?)/ 


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email

[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Guido van Rossum
On Sat, Mar 21, 2020 at 8:38 PM Guido van Rossum  wrote:
> It's not great, and I actually think that "stripprefix" and "stripsuffix"
are reasonable.
> [explanation snipped]

Thinking a bit more, I could also get behind "removeprefix" and
"removesuffix".

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SSKFJD2NAHVQIVXWAXDD4LPG5A4I6NI5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Kyle Stanley
Ivan Pozdeez wrote:
> I must note that names conforming to
https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would
be "strip_prefix" and "strip_suffix".

In this case, being in line with the existing string API method names take
priority over PEP 8, e.g. splitlines, startswith, endswith, splitlines,
etc. Although I agree that an underscore would probably be a bit easier to
read here, it would be rather confusing to randomly swap between the naming
convention for the same API. The benefit gained in *slightly *easier
readability wouldn't make up for the headache IMO.

On Sun, Mar 22, 2020 at 12:13 AM Ivan Pozdeev via Python-Dev <
python-dev@python.org> wrote:

> On 22.03.2020 6:38, Guido van Rossum wrote:
>
> On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan  wrote:
>
>> On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith,  wrote:
>>
>>> On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
>>>  wrote:
>>> > This is a proposal to add two new methods, ``cutprefix`` and
>>> > ``cutsuffix``, to the APIs of Python's various string objects.
>>>
>>> The names should use "start" and "end" instead of "prefix" and
>>> "suffix", to reduce the jargon factor and for consistency with
>>> startswith/endswith.
>>>
>>
>> This would also be more consistent with startswith() & endswith(). (For
>> folks querying this: the relevant domain here is "str builtin method
>> names", and we already use startswith/endswith there, not
>> hasprefix/hassuffix. The most challenging relevant audience for new str
>> builtin method *names* is also 10 year olds learning to program in school,
>> not adults reading the documentation)
>>
>
> To my language sense, hasprefix/hassuffix are horrible compared to
> startswith/endswith. If you were to talk about this kind of condition using
> English instead of Python, you wouldn't say "if x has prefix y", you'd say
> "if x starts with y". (I doubt any programming language uses hasPrefix or
> has_prefix for this, making it a strawman.)
>
> *But*, what would you say if you wanted to express the idea or removing
> something from the start or end? It's pretty verbose to say "remove y from
> the end of x", and it's not easy to translate that into a method name.
> x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which
> confuses the reader.
>
> The thing is that it's hard to translate "starts" (a verb) into a noun --
> the "start" of something is its very beginning (i.e., in Python, position
> zero), while a "prefix" is a noun that specifically describes an initial
> substring (and I'm glad we don't have to use *that* :-).
>
>
>> I think the concern about stripstart() & stripend() working with
>> substrings, while strip/lstrip/rstrip work with character sets, is valid,
>> but I also share the concern about introducing "cut" as yet another verb to
>> learn in the already wide string API.
>>
>
> It's not great, and I actually think that "stripprefix" and "stripsuffix"
> are reasonable. (I found that in Go, everything we call "strip" is called
> "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that
> correspond to the PEP 616 functions.)
>
> I must note that names conforming to
> https://www.python.org/dev/peps/pep-0008/#function-and-variable-names
> would be "strip_prefix" and "strip_suffix".
>
>
>
>> The example where the new function was used instead of a questionable use
>> of replace gave me an idea, though: what if the new functions were
>> "replacestart()" and "replaceend()"?
>>
>> * uses "start" and "with" for consistency with the existing checks
>> * substring based, like the "replace" method
>> * can be combined with an extension of "replace()" to also accept a tuple
>> of old values to match and replace to allow for consistency with checking
>> for multiple prefixes or suffixes.
>>
>> We'd expect the most common case to be the empty string, but I think the
>> meaning of the following is clear, and consistent with the current practice
>> of using replace() to delete text from anywhere within the string:
>>
>> s = s.replacestart('context.' , '')
>>
>
> This feels like a hypergeneralization. In 99.9% of use cases we just need
> to remove the prefix or suffix. If you want to replace the suffix with
> something else, you can probably use string concatenation. (In the one use
> case I can think of, changing "foo.c" into "foo.o", it would make sense
> that plain "foo" ended up becoming "foo.o", so s.stripsuffix(".c") + ".o"
> actually works better there.
>
>
>> This approach would also very cleanly handle the last example from the
>> PEP:
>>
>> s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
>>
>
> Maybe the proposed functions can optionally take a tuple of
> prefixes/suffixes, like startswith/endswith do?
>
>
>> The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I
>> think keeping consistency with other str method names would be preferable
>> to adding an underscore to the name.
>>
>
> Agreed on the second part, I just really don't like the 'ee

[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Kyle Stanley
> In this case, being in line with the existing string API method names
take priority over PEP 8, e.g. splitlines, startswith, endswith,
splitlines, etc.

Oops, I just realized that I wrote "splitlines" twice there. I guess that
goes to show how much I use that specific method in comparison to the
others, but the point still stands. Here's a more comprehensive set of
existing string methods to better demonstrate it (Python 3.8.2):

>>> [m for m in dir(str) if not m.startswith('_')]
['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith',
'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum',
'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower',
'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join',
'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind',
'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines',
'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

On Sun, Mar 22, 2020 at 12:17 AM Kyle Stanley  wrote:

> Ivan Pozdeez wrote:
> > I must note that names conforming to
> https://www.python.org/dev/peps/pep-0008/#function-and-variable-names
> would be "strip_prefix" and "strip_suffix".
>
> In this case, being in line with the existing string API method names take
> priority over PEP 8, e.g. splitlines, startswith, endswith, splitlines,
> etc. Although I agree that an underscore would probably be a bit easier to
> read here, it would be rather confusing to randomly swap between the naming
> convention for the same API. The benefit gained in *slightly *easier
> readability wouldn't make up for the headache IMO.
>
> On Sun, Mar 22, 2020 at 12:13 AM Ivan Pozdeev via Python-Dev <
> python-dev@python.org> wrote:
>
>> On 22.03.2020 6:38, Guido van Rossum wrote:
>>
>> On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan  wrote:
>>
>>> On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith,  wrote:
>>>
 On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
  wrote:
 > This is a proposal to add two new methods, ``cutprefix`` and
 > ``cutsuffix``, to the APIs of Python's various string objects.

 The names should use "start" and "end" instead of "prefix" and
 "suffix", to reduce the jargon factor and for consistency with
 startswith/endswith.

>>>
>>> This would also be more consistent with startswith() & endswith(). (For
>>> folks querying this: the relevant domain here is "str builtin method
>>> names", and we already use startswith/endswith there, not
>>> hasprefix/hassuffix. The most challenging relevant audience for new str
>>> builtin method *names* is also 10 year olds learning to program in school,
>>> not adults reading the documentation)
>>>
>>
>> To my language sense, hasprefix/hassuffix are horrible compared to
>> startswith/endswith. If you were to talk about this kind of condition using
>> English instead of Python, you wouldn't say "if x has prefix y", you'd say
>> "if x starts with y". (I doubt any programming language uses hasPrefix or
>> has_prefix for this, making it a strawman.)
>>
>> *But*, what would you say if you wanted to express the idea or removing
>> something from the start or end? It's pretty verbose to say "remove y from
>> the end of x", and it's not easy to translate that into a method name.
>> x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which
>> confuses the reader.
>>
>> The thing is that it's hard to translate "starts" (a verb) into a noun --
>> the "start" of something is its very beginning (i.e., in Python, position
>> zero), while a "prefix" is a noun that specifically describes an initial
>> substring (and I'm glad we don't have to use *that* :-).
>>
>>
>>> I think the concern about stripstart() & stripend() working with
>>> substrings, while strip/lstrip/rstrip work with character sets, is valid,
>>> but I also share the concern about introducing "cut" as yet another verb to
>>> learn in the already wide string API.
>>>
>>
>> It's not great, and I actually think that "stripprefix" and "stripsuffix"
>> are reasonable. (I found that in Go, everything we call "strip" is called
>> "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that
>> correspond to the PEP 616 functions.)
>>
>> I must note that names conforming to
>> https://www.python.org/dev/peps/pep-0008/#function-and-variable-names
>> would be "strip_prefix" and "strip_suffix".
>>
>>
>>
>>> The example where the new function was used instead of a questionable
>>> use of replace gave me an idea, though: what if the new functions were
>>> "replacestart()" and "replaceend()"?
>>>
>>> * uses "start" and "with" for consistency with the existing checks
>>> * substring based, like the "replace" method
>>> * can be combined with an extension of "replace()" to also accept a
>>> tuple of old values to match and replace to allow for consistency with
>>> checking for multiple prefixes or suffixes.
>>>
>>> We'd expect the most common case to be the empty string, but I 

[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Steven D'Aprano
On Sun, Mar 22, 2020 at 06:57:52AM +0300, Ivan Pozdeev via Python-Dev wrote:

> Does it need to be separate methods?

Yes.

Overloading a single method to do two dissimilar things is poor design.


> As written in the PEP preface, the very reason for the PEP is that people 
> are continuously trying to use *strip methods for the suggested 
> functionality -- which shows that this is where they are expecting to find 
> it.

They are only expecting to find it in strip() because there is no other 
alternative where it could be. There's nothing inherent about strip that 
means to delete a prefix or suffix, but when the only other choices are 
such obviously wrong methods as upper(), find(), replace(), count() etc 
it is easy to jump to the wrong conclusion that strip does what is 
wanted.



-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UUH2K5SU6DIWJZCJP34IPFM6UWH7F376/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Dennis Sweeney
I like "removeprefix" and "removesuffix". My only concern before had been 
length, but three more characters than "cut***fix" is a small price to pay for 
clarity.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y4O2AIODGI2Z45A32UK5EHR7A7RLQFOK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Gregory P. Smith
Nice PEP! That this discussion wound up in the NP-complete "naming things"
territory as the main topic right from the start/prefix/beginning speaks
highly of it. :)

The only things left I have to add are (a) agreed on don't specify if it is
a copy or not for str and bytes.. BUT (b) do specify that for bytearray.

Being the only mutable type, it matters. Consistency with other bytearray
methods based on https://docs.python.org/3/library/stdtypes.html#bytearray
suggests copy.

(Someone always wants inplace versions of bytearray methods, that is a
separate topic not for this pep)

Fwiw I *like* your cutprefix/suffix names. Avoiding the terms strip and
trim is wise to avoid confusion and having the name read as nice English is
Pythonic.  I'm not going to vote on other suggestions.

-gps

On Sat, Mar 21, 2020, 9:32 PM Kyle Stanley  wrote:

> > In this case, being in line with the existing string API method names
> take priority over PEP 8, e.g. splitlines, startswith, endswith,
> splitlines, etc.
>
> Oops, I just realized that I wrote "splitlines" twice there. I guess that
> goes to show how much I use that specific method in comparison to the
> others, but the point still stands. Here's a more comprehensive set of
> existing string methods to better demonstrate it (Python 3.8.2):
>
> >>> [m for m in dir(str) if not m.startswith('_')]
> ['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith',
> 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum',
> 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower',
> 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join',
> 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind',
> 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines',
> 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>
> On Sun, Mar 22, 2020 at 12:17 AM Kyle Stanley  wrote:
>
>> Ivan Pozdeez wrote:
>> > I must note that names conforming to
>> https://www.python.org/dev/peps/pep-0008/#function-and-variable-names
>> would be "strip_prefix" and "strip_suffix".
>>
>> In this case, being in line with the existing string API method names
>> take priority over PEP 8, e.g. splitlines, startswith, endswith,
>> splitlines, etc. Although I agree that an underscore would probably be a
>> bit easier to read here, it would be rather confusing to randomly swap
>> between the naming convention for the same API. The benefit gained in 
>> *slightly
>> *easier readability wouldn't make up for the headache IMO.
>>
>> On Sun, Mar 22, 2020 at 12:13 AM Ivan Pozdeev via Python-Dev <
>> python-dev@python.org> wrote:
>>
>>> On 22.03.2020 6:38, Guido van Rossum wrote:
>>>
>>> On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan  wrote:
>>>
 On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith,  wrote:

> On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney
>  wrote:
> > This is a proposal to add two new methods, ``cutprefix`` and
> > ``cutsuffix``, to the APIs of Python's various string objects.
>
> The names should use "start" and "end" instead of "prefix" and
> "suffix", to reduce the jargon factor and for consistency with
> startswith/endswith.
>

 This would also be more consistent with startswith() & endswith(). (For
 folks querying this: the relevant domain here is "str builtin method
 names", and we already use startswith/endswith there, not
 hasprefix/hassuffix. The most challenging relevant audience for new str
 builtin method *names* is also 10 year olds learning to program in school,
 not adults reading the documentation)

>>>
>>> To my language sense, hasprefix/hassuffix are horrible compared to
>>> startswith/endswith. If you were to talk about this kind of condition using
>>> English instead of Python, you wouldn't say "if x has prefix y", you'd say
>>> "if x starts with y". (I doubt any programming language uses hasPrefix or
>>> has_prefix for this, making it a strawman.)
>>>
>>> *But*, what would you say if you wanted to express the idea or removing
>>> something from the start or end? It's pretty verbose to say "remove y from
>>> the end of x", and it's not easy to translate that into a method name.
>>> x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which
>>> confuses the reader.
>>>
>>> The thing is that it's hard to translate "starts" (a verb) into a noun
>>> -- the "start" of something is its very beginning (i.e., in Python,
>>> position zero), while a "prefix" is a noun that specifically describes an
>>> initial substring (and I'm glad we don't have to use *that* :-).
>>>
>>>
 I think the concern about stripstart() & stripend() working with
 substrings, while strip/lstrip/rstrip work with character sets, is valid,
 but I also share the concern about introducing "cut" as yet another verb to
 learn in the already wide string API.

>>>
>>> It's not great, and I actually think that "stripprefix" and
>>

[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Cameron Simpson

On 22Mar2020 05:09, Steven D'Aprano  wrote:
I agree with Ned -- whether the string object is returned unchanged or 
a copy is an implementation decision, not a language decision.


[Eric]

The only reason I can think of is to enable the test above: did a
suffix/prefix removal take place? That seems like a useful thing.


We don't make this guarantee about string identity for any other string
method, and CPython's behaviour varies from method to method:

   py> s = 'a b c'
   py> s is s.strip()
   True
   py> s is s.lower()
   False

and version to version:

   py> s is s.replace('a', 'a')  # 2.7
   False
   py> s is s.replace('a', 'a')  # 3.5
   True

I've never seen anyone relying on this behaviour, and I don't expect
these new methods will change that. Thinking that `is` is another way of
writing `==`, yes, I see that frequently. But relying on object identity
to see whether a new string was created by a method, no.


Well, ok, expressed on this basis, colour me convinced. I'm not ok with 
not mandating that no change to the string returns an equal string (but, 
really, _only_ because i can do a test with len(), as I consider a test 
of content wildly excessive - potentially quite expensive - strings are 
not always short).



If you want to know whether a prefix/suffix was removed, there's a more
reliable way than identity and a cheaper way than O(N) equality. Just
compare the length of the string before and after. If the lengths are
the same, nothing was removed.


Aye.

Cheers,
Cameron Simpson 
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TOO62DCWEANP23FN6MI4YIPQIIDAQ53U/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Cameron Simpson

On 21Mar2020 12:45, Eric V. Smith  wrote:

On 3/21/2020 12:39 PM, Victor Stinner wrote:

Well, if CPython is modified to implement tagged pointers and supports
storing a short strings (a few latin1 characters) as a pointer, it may
become harder to keep the same behavior for "x is y" where x and y are
strings.


Are you suggesting that it could become impossible to write this 
function:


   def myself(o):
   return o

and not be able to rely on "o is myself(o)"? That seems... a pretty 
nasty breaking change for the language.


Good point. And I guess it's still a problem for interned strings, 
since even a copy could be the same object:



s = 'for'
s[:] is 'for'

True

So I now agree with Ned, we shouldn't be prescriptive here, and we 
should explicitly say in the PEP that there's no way to tell if the 
strip/cut/whatever took place, other than comparing via equality, not 
identity.


Unless Victor asserts that a function like myself() above cannot be 
relied on to have its return value "is" its passed in value, I disagree.  
The beauty of returning the original object on no change is that the 
test is O(1) and the criterion is clear. It is easy to document that 
stripping an empty affix returns the original string.


I guess a test for len(stripped_string) == len(unstripped_string) is 
also O(1), and is less prescriptive. I just don't see the weight to 
Ned's characterisation of "a is/is-not b" as overly prescriptive; 
returning the same reference as one is given seems nearly the easiest 
thing a function can ever do.


Cheers,
Cameron Simpson 
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/22RNX6ABI7KATARTGJPHBI3OKAE4XHED/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Ivan Pozdeev via Python-Dev



On 22.03.2020 7:46, Steven D'Aprano wrote:

On Sun, Mar 22, 2020 at 06:57:52AM +0300, Ivan Pozdeev via Python-Dev wrote:


Does it need to be separate methods?

Yes.

Overloading a single method to do two dissimilar things is poor design.

They are similar. We're removing stuff from an edge in both cases. The only difference is whether input is treated as a character set or as 
a raw substring.

As written in the PEP preface, the very reason for the PEP is that people
are continuously trying to use *strip methods for the suggested
functionality -- which shows that this is where they are expecting to find
it.

They are only expecting to find it in strip() because there is no other
alternative where it could be. There's nothing inherent about strip that
means to delete a prefix or suffix, but when the only other choices are
such obviously wrong methods as upper(), find(), replace(), count() etc
it is easy to jump to the wrong conclusion that strip does what is
wanted.




--
Regards,
Ivan
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/V5N5K6WFWM4QPJ5YUGSCE6HY47P25PVG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Cameron Simpson

On 21Mar2020 14:40, Eric V. Smith  wrote:

On 3/21/2020 2:09 PM, Steven D'Aprano wrote:
If you want to know whether a prefix/suffix was removed, there's a 
more

reliable way than identity and a cheaper way than O(N) equality. Just
compare the length of the string before and after. If the lengths are
the same, nothing was removed.


That's a good point. This should probably go in the PEP, and maybe the 
documentation.


+1000 to this. - Cameron
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GHBT5RREZRMKXZDE6ZG3EZGLU3CM7VNW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Cameron Simpson

On 21Mar2020 14:17, mus...@posteo.org  wrote:

On Fri, 20 Mar 2020 20:49:12 -
"Dennis Sweeney"  wrote:

exactly same way (as a character set) in each case. Looking at how
the argument is used, I'd argue that ``lstrip``/``rstrip``/``strip``
are much more similar to each other than they are to the proposed
methods


Correct, but I don't like the word "cut" because it suggests that
something is cut into pieces which can be used later separately.

I'd propose to use "trim" instead of "cut" because it makes clear that
something is cut off and discarded, and it is clearly different from
"strip".


Please, NO. "trim" is a VERY well known PHP function, and does what our 
strip does. I've very against this (otherwise fine) word for this 
reason.


I still prefer "cut", though the consensus seems to be for "strip".

Cheers,
Cameron Simpson 
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7P55K6BICBQ4YEKXD373SX2SRYRWKNU2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Nick Coghlan
On Sun, 22 Mar 2020 at 14:01, Dennis Sweeney
 wrote:
>
> Is there a proven use case for anything other than the empty string as the 
> replacement? I prefer your "replacewhatever" to another "stripwhatever" name, 
> and I think it's clear and nicely fits the behavior you proposed. But should 
> we allow a naming convenience to dictate that the behavior should be 
> generalized to a use case we're not sure exists, where the same same argument 
> is passed 99% of the time?

I think so, as if we don't, then we'd end up with the following three
methods on str objects (using Guido's suggested names of
"removeprefix" and "removesuffix", as I genuinely like those):

* replace()
* removeprefix()
* removesuffix()

And the following questions still end up with relatively non-obvious answers:

Q: How do I do a replace, but only at the start or end of the string?
A: Use "new_prefix + s.removeprefix(old_prefix)" or
"s.removesuffix(old_suffix) + new_suffix"

Q: How do I remove a substring from anywhere in a string, rather than
just from the start or end?
A: Use "s.replace(substr, '')"

Most of that objection would go away if the PEP added a plain old
"remove()" method in addition to removeprefix() and removesuffix(),
though - the "replace the substring with an empty string" trick isn't
the most obvious spelling in the world, whereas I'd expect a lot folks
to reach for "s.remove(substr)" based on the regular sequence API, and
I think Guido's right that in many cases where a prefix or suffix is
being changed, you also want to add it if the old prefix/suffix is
missing (and in the cases where you don't then, then you can either
use startswith()/endswith() first, or else check for a length change.

> I think a downside would be that a pass-a-string-or-a-tuple-of-strings 
> interface would be more mental effort to keep track of than a ``*args`` 
> variadic interface for "(cut/remove/without/trim)prefix", even if the former 
> is how ``startswith()`` works.

I doubt we'd use *args for any new string methods, precisely because
we don't use it for any of the existing ones.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ULNZVYYZBX6RHEAVWGO4AIDOQSNSCURJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Barney Gale
My 2c on the naming:

'start' and 'end' in 'startswith' and 'endswith' are verbs, whereas we're
looking for a noun if we want to cut/strip/trim a string. You can use
'start' and 'end' as nouns for this case but 'prefix' and 'suffix' seems a
more obvious choice in English to me.

Pathlib has `with_suffix()` and `with_name()`, which would give us
something like `without_prefix()` or `without_suffix()` in this case.

I think the name "strip", and the default (no-argument) behaviour of
stripping whitespace implies that the method is used to strip something
down to its bare essentials, like stripping a bed of its covers. Usually
you use strip() to remove whitespace and get to the real important data. I
don't think such an implication holds for removing a *specific*
prefix/suffix.

I also don't much like "strip" as the semantics are quite different - if
i'm understanding correctly, we're removing a *single* instance of a
*single* *multi-character* string. A verb like "trim" or "cut" seems
appropriate to highlight that difference.

Barney



On Fri, 20 Mar 2020 at 18:59, Dennis Sweeney 
wrote:

> Browser Link: https://www.python.org/dev/peps/pep-0616/
>
> PEP: 616
> Title: String methods to remove prefixes and suffixes
> Author: Dennis Sweeney 
> Sponsor: Eric V. Smith 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 19-Mar-2020
> Python-Version: 3.9
> Post-History: 30-Aug-2002
>
>
> Abstract
> 
>
> This is a proposal to add two new methods, ``cutprefix`` and
> ``cutsuffix``, to the APIs of Python's various string objects.  In
> particular, the methods would be added to Unicode ``str`` objects,
> binary ``bytes`` and ``bytearray`` objects, and
> ``collections.UserString``.
>
> If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
> ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
> been removed.  If ``s`` does not have ``pre`` as a prefix, an
> unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
> is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
>
> The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is
> roughly equivalent to
> ``s[:-len(suf)] if suf and s.endswith(suf) else s``.
>
>
> Rationale
> =
>
> There have been repeated issues [#confusion]_ on the Bug Tracker
> and StackOverflow related to user confusion about the existing
> ``str.lstrip`` and ``str.rstrip`` methods.  These users are typically
> expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they
> are surprised that the parameter for ``lstrip`` is interpreted as a
> set of characters, not a substring.  This repeated issue is evidence
> that these methods are useful, and the new methods allow a cleaner
> redirection of users to the desired behavior.
>
> As another testimonial for the usefulness of these methods, several
> users on Python-Ideas [#pyid]_ reported frequently including similar
> functions in their own code for productivity.  The implementation
> often contained subtle mistakes regarding the handling of the empty
> string (see `Specification`_).
>
>
> Specification
> =
>
> The builtin ``str`` class will gain two new methods with roughly the
> following behavior::
>
> def cutprefix(self: str, pre: str, /) -> str:
> if self.startswith(pre):
> return self[len(pre):]
> return self[:]
>
> def cutsuffix(self: str, suf: str, /) -> str:
> if suf and self.endswith(suf):
> return self[:-len(suf)]
> return self[:]
>
> The only difference between the real implementation and the above is
> that, as with other string methods like ``replace``, the
> methods will raise a ``TypeError`` if any of ``self``, ``pre`` or
> ``suf`` is not an instace of ``str``, and will cast subclasses of
> ``str`` to builtin ``str`` objects.
>
> Note that without the check for the truthyness of ``suf``,
> ``s.cutsuffix('')`` would be mishandled and always return the empty
> string due to the unintended evaluation of ``self[:-0]``.
>
> Methods with the corresponding semantics will be added to the builtin
> ``bytes`` and ``bytearray`` objects.  If ``b`` is either a ``bytes``
> or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
> will accept any bytes-like object as an argument.
>
> Note that the ``bytearray`` methods return a copy of ``self``; they do
> not operate in place.
>
> The following behavior is considered a CPython implementation detail,
> but is not guaranteed by this specification::
>
> >>> x = 'foobar' * 10**6
> >>> x.cutprefix('baz') is x is x.cutsuffix('baz')
> True
> >>> x.cutprefix('') is x is x.cutsuffix('')
> True
>
> That is, for CPython's immutable ``str`` and ``bytes`` objects, the
> methods return the original object when the affix is not found or if
> the affix is empty.  Because these types test for equality using
> shortcuts for identity and length, the following equivalent
> expressions are e

[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

2020-03-21 Thread Nick Coghlan
On Sun, 22 Mar 2020 at 15:13, Cameron Simpson  wrote:
>
> On 21Mar2020 12:45, Eric V. Smith  wrote:
> >On 3/21/2020 12:39 PM, Victor Stinner wrote:
> >>Well, if CPython is modified to implement tagged pointers and supports
> >>storing a short strings (a few latin1 characters) as a pointer, it may
> >>become harder to keep the same behavior for "x is y" where x and y are
> >>strings.
>
> Are you suggesting that it could become impossible to write this
> function:
>
> def myself(o):
> return o
>
> and not be able to rely on "o is myself(o)"? That seems... a pretty
> nasty breaking change for the language.

Other way around - because strings are immutable, their identity isn't
supposed to matter, so it's possible that functions that currently
return the exact same object in some cases may in the future start
returning a different object with the same value.

Right now, in CPython, with no tagged pointers, we return the full
existing pointer wherever we can, as that saves us a data copy. With
tagged pointers, the pointer storage effectively *is* the instance, so
you can't really replicate that existing "copy the reference not the
storage" behaviour any more.

That said, it's also possible that identity for tagged pointers would
be value based (similar to the effect of the small integer cache and
string interning), in which case the entire question would become
moot.

Either way, the PEP shouldn't be specifying that a new object *must*
be returned, and it also shouldn't be specifying that the same object
*can't* be returned.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NDRZ4G2S2GG74UYBCZ46N7QPL3SFFO5K/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Proliferation of tstate arguments.

2020-03-21 Thread Nick Coghlan
On Sat, 21 Mar 2020 at 13:15, Stephen J. Turnbull
 wrote:
> That makes a lot of sense, as a strategy for doing the work.  It
> should be pretty straightforward to convert the tstate argument to a
> thread-local tstate.

Note that the thread locals are already there, as that's how the
public API already works. The problem is what to do when that tstate
*hasn't* been populated yet.

PEP 311's EnsureGIL APIs try to help with that problem, but they
assume there's only one interpreter, and implicitly create a new
tstate if one doesn't already exist.

The APIs that now accept an explicit tstate don't care, as you can
acquire that tstate from anywhere, it doesn't need to already be
active on the current thread.

The main working PEP at the moment is PEP 554, but the overall project
tying everything together is written up at
https://github.com/ericsnowcurrently/multi-core-python

> But ... this sounds to me like work that should be done on a branch.

The initial PEP 432 work to clean up interpreter initialisation did
live on a branch for a couple of years. The problem was that we kept
finding *bugs* and other behaviour problems that needed those
architectural changes to properly resolve (like Victor's PEP 540
"UTF-8 mode"), as well an inability to properly *test* that we hadn't
broken anything without exposing the revised architecture to the full
complexity of CPython's real world use cases.

Thus the original switch to the "in-tree private API" approach as
described in https://www.python.org/dev/peps/pep-0432/#implementation-strategy
which ultimately lead to the public initialisation API changes in PEP
587 (and the original 3.7.0 release *did* identify previously untested
embedding use cases that broke in 3.7.0 and were fixed in 3.7.1).

For PEP 554, the proposed public "interpreters" module is out of tree,
but the low level changes to fix assorted bugs in interpreter
finalization and the existing subinterpreter support is happening
in-tree.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XNKTOTKUFYCQI5VRKSG54SFVU74Y7X6I/
Code of Conduct: http://python.org/psf/codeofconduct/