[issue18679] include a codec to handle escaping only control characters but not any others

Derek Wilson Fri, 04 Oct 2013 13:55:30 -0700

Derek Wilson added the comment:

Any update on this? Just so you can see what my work around is, I'll paste in 
the code I'm using. The major issue I have with this is that performance 
doesn't scale to large strings.


This is also a bytes-to-bytes or str-to-str encoding, because this is the type 
of operation that one plans to do with the data one has.

Having a full fledged streaming codec to handle this would be very helpful when 
writing applications that stream tab and newline separated utf-8 data over 
stdin/stdout.
                                                                                
                                  
text_types = (str, )                                                      

escape_tm = dict((k, repr(chr(k))[1:-1]) for k in range(32))              
escape_tm[0] = '\0'                                                            
escape_tm[7] = '\a'                                                            
escape_tm[8] = '\b'                                                            
escape_tm[11] = '\v'                                                           
escape_tm[12] = '\f'                                                           
escape_tm[ord('\\')] = '\\\\'

def escape_control(s):                                                          
    if isinstance(s, text_types):                                               
        return s.translate(escape_tm)
    else:
        return s.decode('utf-8', 
'surrogateescape').translate(escape_tm).encode('utf-8', 'surrogateescape')

def unescape_control(s):                                                        
    if isinstance(s, text_types):                                               
        return s.encode('latin1', 'backslashreplace').decode('unicode_escape')
    else:                                                                       
        return s.decode('utf-8', 'surrogateescape').encode('latin1', 
'backslashreplace').decode('unicode_escape').encode('utf-8', 'surrogateescape')

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18679>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18679] include a codec to handle escaping only control characters but not any others

Reply via email to