[issue22360] Adding manually offset parameter to str/bytes split function
New submission from Christoph Wruck: Currently we have a "split" function which splits a str/bytestr into chunks of their underlying data. This works great for the most tivial jobs. But there is no possibility to pass an offset parameter into the split function which indicates the next "user-defined" starting index. Actually the next starting position will be build upon the last starting position (of found sep.) + separator length + 1. It should be possible to manipulate the next starting index by changing this behavior into: last starting position (of found sep.) + separator length + OFFSET. NOTE: The slicing start index (for substring) stay untouched. This will help us to solve splitting sequences with one or more consecutive separators. The following demonstrates the actually behavior. >>> s = 'abc;;def;hij' >>> s.split(';') ['abc', '', 'def', 'hij'] This works fine for both str/bytes values. The following demonstrates an "offset variant" of split function. >>> s = 'abc;;def;hij' >>> s.split(';', offset=1) ['abc', ';def', 'hij'] The behavior of maxcount/None sep. parameter should be generate the same output as before. A change will be affect (as far as I can see): - split.h - split_char/rsplit_char - split/rsplit -- messages: 226564 nosy: cwr priority: normal severity: normal status: open title: Adding manually offset parameter to str/bytes split function type: enhancement versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue22360> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22360] Adding manually offset parameter to str/bytes split function
Christoph Wruck added the comment: Hi Steven exactly - you're right with this. 'spam--eggs--cheesetoast'.split('-', offset=1) --> ['spam', '-eggs', '-cheese', '-', '-toast'] 'spam--eggs--cheese--toast'.split('-', offset=8) --> ['spam', '-eggs--cheese', '-toast'] Okay - the name "offset" might be an unfortunate choice and you are right that this could be hard to understand for a caller. One more examples: The following removes all escape signs to process the octal escape sequences in a second way if the first three characters are digits. 'spam\\055055-eggs-rest'.split('\\', offset=1) --> ['spam', '055', '\\055-eggs-', '\\rest'] # could speed up the split built-in func if a caller knows that every chunk is 3 chars long? 'tic-tac-toe'.split('-', offset=3) A caller could use the offset parameter to keep all separators between the last found and offset if it's a part of a chunk. Or if he awaiting a separator followed by itself which should be keeped - in doubt with the same length of separator. -- ___ Python tracker <http://bugs.python.org/issue22360> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22360] Adding manually offset parameter to str/bytes split function
Christoph Wruck added the comment: Serhiy, you will be right if you've to split a complex string such spliting strings with more than one separator. In this case I would prefer a regex bases solution too. Otherwise we could actually use the re-lib for every of those jobs without using the fast built-in str/bytes split function. Unfortunately lags re.split/findall again str/bytes split function. -- resolution: rejected -> status: closed -> open ___ Python tracker <http://bugs.python.org/issue22360> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22360] Adding manually offset parameter to str/bytes split function
Christoph Wruck added the comment: David, I'll reflect on it. @ALL - Thank's for all answers. Should I close this ticket? -- ___ Python tracker <http://bugs.python.org/issue22360> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27092] ord() raises TypeError in string input
New submission from Christoph Wruck: Hi, is there any reason why ord() raises a TypeError instead of ValueError on string/bytes input with wrong length? The chr() function will raise a ValueError on negative integers such as chr(-1). Required behaviour: try: n = ord(input_string) except ValueError as e: # it's a string/bytes-string, process potential escape sequence and # get an ordinal number of decoded escape sequence, otherwise raise ... with kind regards, Chris -- components: Unicode messages: 266142 nosy: cwr, ezio.melotti, haypo priority: normal severity: normal status: open title: ord() raises TypeError in string input versions: Python 3.4 ___ Python tracker <http://bugs.python.org/issue27092> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27092] ord() raises TypeError in string input
Changes by Christoph Wruck : -- type: -> enhancement ___ Python tracker <http://bugs.python.org/issue27092> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27092] ord() raises TypeError in string input
Changes by Christoph Wruck : -- type: enhancement -> behavior ___ Python tracker <http://bugs.python.org/issue27092> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27092] ord() raises TypeError on string/bytes input
Christoph Wruck added the comment: closed as redundant to: http://bugs.python.org/issue27008 -- resolution: -> not a bug status: open -> closed title: ord() raises TypeError in string input -> ord() raises TypeError on string/bytes input ___ Python tracker <http://bugs.python.org/issue27092> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com