[issue22360] Adding manually offset parameter to str/bytes split function

2014-09-08 Thread Christoph Wruck

New submission from Christoph Wruck:

Currently we have a "split" function which splits a str/bytestr into
chunks of their underlying data. This works great for the most tivial jobs.
But there is no possibility to pass an offset parameter into the split
function which indicates the next "user-defined" starting index.

Actually the next starting position will be build upon the last starting
position (of found sep.) + separator length + 1.

It should be possible to manipulate the next starting index by changing this
behavior into:

last starting position (of found sep.) + separator length + OFFSET.

NOTE: The slicing start index (for substring) stay untouched.

This will help us to solve splitting sequences with one or more consecutive
separators. The following demonstrates the actually behavior.

>>> s = 'abc;;def;hij'
>>> s.split(';')
['abc', '', 'def', 'hij']

This works fine for both str/bytes values.
The following demonstrates an "offset variant" of split function.

>>> s = 'abc;;def;hij'
>>> s.split(';', offset=1)
['abc', ';def', 'hij']

The behavior of maxcount/None sep. parameter should be generate the same
output as before.

A change will be affect (as far as I can see):
- split.h
- split_char/rsplit_char
- split/rsplit

--
messages: 226564
nosy: cwr
priority: normal
severity: normal
status: open
title: Adding manually offset parameter to str/bytes split function
type: enhancement
versions: Python 3.5

___
Python tracker 
<http://bugs.python.org/issue22360>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22360] Adding manually offset parameter to str/bytes split function

2014-09-08 Thread Christoph Wruck

Christoph Wruck added the comment:

Hi Steven

exactly - you're right with this.

'spam--eggs--cheesetoast'.split('-', offset=1)
--> ['spam', '-eggs', '-cheese', '-', '-toast']

'spam--eggs--cheese--toast'.split('-', offset=8)
--> ['spam', '-eggs--cheese', '-toast']

Okay - the name "offset" might be an unfortunate choice and you are right that 
this could be hard to understand for a caller. 

One more examples:

The following removes all escape signs to process the octal escape sequences in 
a second way if the first three characters are digits.

'spam\\055055-eggs-rest'.split('\\', offset=1)
--> ['spam', '055', '\\055-eggs-', '\\rest']

# could speed up the split built-in func if a caller knows that every chunk is 
3 chars long?
'tic-tac-toe'.split('-', offset=3)

A caller could use the offset parameter to keep all separators between
the last found and offset if it's a part of a chunk. Or if he awaiting a 
separator followed by itself which should be keeped - in doubt with the same 
length of separator.

--

___
Python tracker 
<http://bugs.python.org/issue22360>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22360] Adding manually offset parameter to str/bytes split function

2014-09-08 Thread Christoph Wruck

Christoph Wruck added the comment:

Serhiy, you will be right if you've to split a complex string such spliting 
strings with more than one separator. In this case I would prefer a regex bases 
solution too. Otherwise we could actually use the re-lib for every of those 
jobs without using the fast built-in str/bytes split function. Unfortunately 
lags re.split/findall again str/bytes split function.

--
resolution: rejected -> 
status: closed -> open

___
Python tracker 
<http://bugs.python.org/issue22360>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22360] Adding manually offset parameter to str/bytes split function

2014-09-08 Thread Christoph Wruck

Christoph Wruck added the comment:

David, I'll reflect on it. @ALL - Thank's for all answers. 
Should I close this ticket?

--

___
Python tracker 
<http://bugs.python.org/issue22360>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27092] ord() raises TypeError in string input

2016-05-23 Thread Christoph Wruck

New submission from Christoph Wruck:

Hi,
is there any reason why ord() raises a TypeError instead of ValueError on 
string/bytes input with wrong length?

The chr() function will raise a ValueError on negative integers such as chr(-1).

Required behaviour:

try:
n = ord(input_string)
except ValueError as e:
# it's a string/bytes-string, process potential escape sequence and
# get an ordinal number of decoded escape sequence, otherwise raise
...

with kind regards,
Chris

--
components: Unicode
messages: 266142
nosy: cwr, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: ord() raises TypeError in string input
versions: Python 3.4

___
Python tracker 
<http://bugs.python.org/issue27092>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27092] ord() raises TypeError in string input

2016-05-23 Thread Christoph Wruck

Changes by Christoph Wruck :


--
type:  -> enhancement

___
Python tracker 
<http://bugs.python.org/issue27092>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27092] ord() raises TypeError in string input

2016-05-23 Thread Christoph Wruck

Changes by Christoph Wruck :


--
type: enhancement -> behavior

___
Python tracker 
<http://bugs.python.org/issue27092>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27092] ord() raises TypeError on string/bytes input

2016-05-23 Thread Christoph Wruck

Christoph Wruck added the comment:

closed as redundant to:

http://bugs.python.org/issue27008

--
resolution:  -> not a bug
status: open -> closed
title: ord() raises TypeError in string input -> ord() raises TypeError on 
string/bytes input

___
Python tracker 
<http://bugs.python.org/issue27092>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com