Regular expression problem

2008-06-22 Thread abranches
Hello everyone.

I'm having a problem when extracting data from HTML with regular
expressions.
This is the source code:

You are ready in the next12M 48S

And I need to get the remaining time. Until here, isn't a problem
getting it, but if the remaining time is less than 60 seconds then the
source becomes something like this:

You are ready in the next36S

I'm using this regular expression, but the minutes are always None...
You are ready in the next.*?(?:>(\d+)M)?.*?(?:>(\d+)S)

If I remove the ? from the first group, then it will work, but if
there are only seconds it won't work.
I could resolve this problem in a couple of python lines, but I really
would like to solve it with regular expressions.

Thanks,
Pedro Abranches
--
http://mail.python.org/mailman/listinfo/python-list


Print encoding problems in console

2011-07-15 Thread Pedro Abranches
Hello everyone.

I'm having a problem when outputing UTF-8 strings to a console.
Let me show a simple example that explains it:

$ python -c 'import sys; print sys.stdout.encoding; print u"\xe9"'
UTF-8
é

It's everything ok.
Now, if you're using your python script in some shell script you might have
to store the output in some variable, like this:

$ var=`python -c 'import sys; print sys.stdout.encoding; print u"\xe9"'`

And what you get is:

Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position
0: ordinal not in range(128)

So, python is not being able to detect the encoding of the output in a
situation like that, in which the python script is called not directly but
around ``.

Why does happen? Is there a way to solve it either by python or by shell
code?

Thanks,
Pedro Abranches
-- 
http://mail.python.org/mailman/listinfo/python-list