On Tue, May 19, 2009 at 5:36 AM, spir <denis.s...@free.fr> wrote:
> Hello,
>
> This is a follow the post on performance issues.
> Using a profiler, I realized that inside error message creation, most of the 
> time was spent in a tool func used to clean up source text output.
> The issue is that when the source text holds control chars such as \n, then 
> the error message is hardly readible. MY solution is to replace such chars 
> with their repr():
>
> def _cleanRepr(text):
>        ''' text with control chars replaced by repr() equivalent '''
>        result = ""
>        for char in text:
>                n = ord(char)
>                if (n < 32) or (n > 126 and n < 160):
>                        char = repr(char)[1:-1]
>                result += char
>        return result
>
> For any reason, this func is extremely slow. While the rest of error message 
> creation looks very complicated, this seemingly innocent consume > 90% of the 
> time. The issue is that I cannot use repr(text), because repr will replace 
> all non-ASCII characters. I need to replace only control characters.
> How else could I do that?

I would get rid of the calls to ord() and repr() to start. There is no
need for ord() at all, just compare characters directly:
if char < '\x20' or '\x7e' < char < '\xa0':

To eliminate repr() you could create a dict mapping chars to their
repr and look up in that.

You should also look for a way to get the loop into C code. One way to
do that would be to use a regex to search and replace. The replacement
pattern in a call to re.sub() can be a function call; the function can
do the dict lookup. Here is something to try:

import re

# Create a dictionary of replacement strings
repl = {}
for i in range(0, 32) + range(127, 160):
    c = chr(i)
    repl[c] = repr(c)[1:-1]

def sub(m):
    ''' Helper function for re.sub(). m will be a Match object. '''
    return repl[m.group()]

# Regex to match control chars
controlsRe = re.compile(r'[\x00-\x1f\x7f-\x9f]')

def replaceControls(s):
    ''' Replace all control chars in s with their repr() '''
    return controlsRe.sub(sub, s)


for s in [
    'Simple test',
    'Harder\ntest',
    'Final\x00\x1f\x20\x7e\x7f\x9f\xa0test'
]:
    print replaceControls(s)


Kent
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to