Regular expression problem
Hello everyone. I'm having a problem when extracting data from HTML with regular expressions. This is the source code: You are ready in the next12M 48S And I need to get the remaining time. Until here, isn't a problem getting it, but if the remaining time is less than 60 seconds then the source becomes something like this: You are ready in the next36S I'm using this regular expression, but the minutes are always None... You are ready in the next.*?(?:>(\d+)M)?.*?(?:>(\d+)S) If I remove the ? from the first group, then it will work, but if there are only seconds it won't work. I could resolve this problem in a couple of python lines, but I really would like to solve it with regular expressions. Thanks, Pedro Abranches -- http://mail.python.org/mailman/listinfo/python-list
Print encoding problems in console
Hello everyone. I'm having a problem when outputing UTF-8 strings to a console. Let me show a simple example that explains it: $ python -c 'import sys; print sys.stdout.encoding; print u"\xe9"' UTF-8 é It's everything ok. Now, if you're using your python script in some shell script you might have to store the output in some variable, like this: $ var=`python -c 'import sys; print sys.stdout.encoding; print u"\xe9"'` And what you get is: Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128) So, python is not being able to detect the encoding of the output in a situation like that, in which the python script is called not directly but around ``. Why does happen? Is there a way to solve it either by python or by shell code? Thanks, Pedro Abranches -- http://mail.python.org/mailman/listinfo/python-list
