[issue37620] str.split(sep=None, maxsplit=-1,any=False)

2019-07-18 Thread Harry Coin


New submission from Harry Coin :

When first I read the str.split documentation I parsed it to mean
'ab\t cd ef'.split(sep=' \t') --> ['ab','cd','ef']
Especially as the given example in the docs with the <> would have led to the 
given result read the way I read it.

I suggest adding a parameter 'any=False' which by default gives the current 
behavior.  But when True treats each character in the sep string as a delimiter 
and eliminates any combination of them from the resulting list.

The use cases are many, for example parsing the /etc/hosts file where we see an 
address, some white space that could be any combination of \t and ' ' followed 
by more text. 

One could imagine 'abc  \tdef, hgi,jlk'.split(', \t',any=True) -> 
['abc','def','hgi','jlk'] being used quite often.

--
components: Library (Lib)
messages: 348116
nosy: hcoin
priority: normal
severity: normal
status: open
title: str.split(sep=None, maxsplit=-1,any=False)
type: enhancement

___
Python tracker 
<https://bugs.python.org/issue37620>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37620] str.split(sep=None, maxsplit=-1,any=False)

2019-07-19 Thread Harry Coin

Harry Coin  added the comment:

I suspect the number of times the str.split builtin was examined for use 
and rejected in favor of the much more complex and 'heavy' re module 
far, far exceeds the number of times it found use with more than one 
character in the split string.

The str.split documentation 'feels like' the python equivalent of the 
linux 'tr' utility that treats the separator characters as a set instead 
of a sequence.   Notice the default and the help(str.split) 
documentation tends to encourage that intuition as no sep= has a very 
different behavior:  no argument 'removes any whitespace and discards 
empty strings from the result'.  That leads one to suspect each 
character in a string would do the same.

Mostly it's a use-case driven obviousness, you'd think python would 
naturally do that in str.split. So very many cases seek to resolve a 
string into a list of the interesting bits without regard to any mix of 
separators  (tabs, spaces, etc to increase the readability of the file).

I think it would be a heavily used enhancement to add the 'any=True' 
parameter.

Or,  in the alternative, allow the argument to sep to be an iterable so 
that:

'ab, cd'.split(sep=' ,') -->  ['ab, cd']

but

'ab, cd'.split(sep=[' ',',']) -> ['ab', 'cd']

On 7/19/19 1:34 PM, Serhiy Storchaka wrote:
> Serhiy Storchaka  added the comment:
>
> An alternative is to use regular expressions.
>
>>>> re.split('[\t ]+', 'ab\t cd ef')
> ['ab', 'cd', 'ef']
> .
>
> --
> nosy: +serhiy.storchaka
>
> ___
> Python tracker 
> <https://bugs.python.org/issue37620>
> ___

--

___
Python tracker 
<https://bugs.python.org/issue37620>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com