Re: [Tutor] clean text

2009-05-19 Thread Emile van Sebille
On 5/19/2009 11:22 AM spir said... I thought at this solution (having a dict for all chars). But I cannot use it because later I will extend the app to cope with unicode (~ 100_000 chars). So that I really need to filter which chars have to be converted. That seems somewhat of a premature op

Re: [Tutor] clean text

2009-05-19 Thread Alan Gauld
"spir" wrote def _cleanRepr(text): ''' text with control chars replaced by repr() equivalent ''' result = "" for char in text: n = ord(char) if (n < 32) or (n > 126 and n < 160): char = repr(char)[1:-1] result += char return result I haven't read the rest of the r

Re: [Tutor] clean text

2009-05-19 Thread Kent Johnson
On Tue, May 19, 2009 at 1:19 PM, spir wrote: > Thank you Albert, Kent, Sanders, Lie, Malcolm. > > This time regex wins! Thought it wouldn't because of the additional func call > (too bad we cannot pass a mapping to re.sub). Actually the diff. is very > small ;-) The relevant  change is indeed us

Re: [Tutor] clean text

2009-05-19 Thread spir
Le Tue, 19 May 2009 10:49:15 -0700, Emile van Sebille s'exprima ainsi: > On 5/19/2009 10:19 AM spir said... > > Le Tue, 19 May 2009 11:36:17 +0200, > > spir s'exprima ainsi: > > > > [...] > > > > Thank you Albert, Kent, Sanders, Lie, Malcolm. > > > > This time regex wins! Thought it wouldn't

Re: [Tutor] clean text

2009-05-19 Thread Emile van Sebille
On 5/19/2009 10:19 AM spir said... Le Tue, 19 May 2009 11:36:17 +0200, spir s'exprima ainsi: [...] Thank you Albert, Kent, Sanders, Lie, Malcolm. This time regex wins! Thought it wouldn't because of the additional func call (too bad we cannot pass a mapping to re.sub). Actually the diff. is

Re: [Tutor] clean text

2009-05-19 Thread python
Denis, Thank you for sharing your detailed analysis with the list. I'm glad on didn't bet money on the winner :) ... I'm just as surprised as you that the regex solution was the fastest. Malcolm ___ Tutor maillist - Tutor@python.org http://mail.pyt

Re: [Tutor] clean text

2009-05-19 Thread spir
Le Tue, 19 May 2009 11:36:17 +0200, spir s'exprima ainsi: [...] Thank you Albert, Kent, Sanders, Lie, Malcolm. This time regex wins! Thought it wouldn't because of the additional func call (too bad we cannot pass a mapping to re.sub). Actually the diff. is very small ;-) The relevant change

Re: [Tutor] clean text

2009-05-19 Thread Lie Ryan
spir wrote: Hello, This is a follow the post on performance issues. Using a profiler, I realized that inside error message creation, most of the time was spent in a tool func used to clean up source text output. The issue is that when the source text holds control chars such as \n, then the er

Re: [Tutor] clean text

2009-05-19 Thread python
Denis, Untested idea: 1. Fill a dict with pre-calculated repr() values for chars you want to replace (replaceDict) 2. Create a set() of chars that you want to replace (replaceSet). 3. Replace if (n < 32) ... test with if char in replaceSet 4. Lookup the replacement via replaceDict[ char ] vs.

Re: [Tutor] clean text

2009-05-19 Thread Kent Johnson
By the way, the timeit module is very helpful for comparing the speed of different implementations of an algorithm such as are being presented in this thread. You can find examples in the list archives: http://search.gmane.org/?query=timeit&group=gmane.comp.python.tutor Kent __

Re: [Tutor] clean text

2009-05-19 Thread Sander Sweers
2009/5/19 spir : > def _cleanRepr(text): >        ''' text with control chars replaced by repr() equivalent ''' >        result = "" >        for char in text: >                n = ord(char) >                if (n < 32) or (n > 126 and n < 160): >                        char = repr(char)[1:-1] >  

Re: [Tutor] clean text

2009-05-19 Thread A.T.Hofkamp
spir wrote: def _cleanRepr(text): ''' text with control chars replaced by repr() equivalent ''' chars = [] for char in text: n = ord(char) if (n < 32) or (n > 126 and n < 160): char = repr(char)[1:-1]

Re: [Tutor] clean text

2009-05-19 Thread Kent Johnson
On Tue, May 19, 2009 at 5:36 AM, spir wrote: > Hello, > > This is a follow the post on performance issues. > Using a profiler, I realized that inside error message creation, most of the > time was spent in a tool func used to clean up source text output. > The issue is that when the source text h

Re: [Tutor] clean text

2009-05-19 Thread spir
Le Tue, 19 May 2009 11:36:17 +0200, spir s'exprima ainsi: > Hello, > > This is a follow the post on performance issues. > Using a profiler, I realized that inside error message creation, most of > the time was spent in a tool func used to clean up source text output. The > issue is that when the