Re: [Tutor] clean text

spir Tue, 19 May 2009 11:23:16 -0700

Le Tue, 19 May 2009 10:49:15 -0700,
Emile van Sebille <em...@fenx.com> s'exprima ainsi:


> On 5/19/2009 10:19 AM spir said...
> > Le Tue, 19 May 2009 11:36:17 +0200,
> > spir <denis.s...@free.fr> s'exprima ainsi:
> > 
> > [...]
> > 
> > Thank you Albert, Kent, Sanders, Lie, Malcolm.
> > 
> > This time regex wins! Thought it wouldn't because of the additional func
> > call (too bad we cannot pass a mapping to re.sub). Actually the diff. is
> > very small ;-) The relevant  change is indeed using a dict. Replacing
> > string concat with ''.join() is slower (tested with 10 times and 100
> > times bigger strings too). Strange... Membership test in a set is only
> > very slightly faster than in dict keys.
> 
> Hmm... this seems faster assuming it does the same thing...
> 
> xlate = dict( (chr(c),chr(c)) for c in range(256))
> xlate.update(control_char_map)
> 
> def cleanRepr5(text):
>      return "".join([ xlate[c] for c in text ])
> 
> 
> Emile

Thank you, Emile.
I thought at this solution (having a dict for all chars). But I cannot use it 
because later I will extend the app to cope with unicode (~ 100_000 chars). So 
that I really need to filter which chars have to be converted.
A useful help I guess would be to have a builtin func that returns conventional 
char/string repr without "'...'" around.

Denis

PS
By the way, you don't need (anymore) to build a list comprehension for an outer 
func that walks through a sequence:
   "".join( xlate[c] for c in text )
is a shortcut for
   "".join( (xlate[c] for c in text) )
[a generator expression already inside () needs no additional parens -- as long 
as there is no additional arg -- see PEP 289 
http://www.python.org/dev/peps/pep-0289/]
------
la vita e estrany
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] clean text

Reply via email to