On 23/07/13 04:14, Jim Mooney wrote:
I tried translating the odd chars I found in my dos tree /f listing to
symbols, but I'm getting this error. The chars certainly aren't over
10000, The ord is only 13 - so what's wrong here?
def main():
zark = ''
for x in "ÀÄÄÄ":
zark += unichr(ord(x)-45)
This is broken in three ways that I can see.
Firstly, assuming you are using Python 2.7 (as you have said in the past),
"ÀÄÄÄ" does not mean what you think it means.
In Python 3, this is a Unicode string containing four individual characters:
LATIN CAPITAL LETTER A WITH GRAVE
LATIN CAPITAL LETTER A WITH DIAERESIS
LATIN CAPITAL LETTER A WITH DIAERESIS
LATIN CAPITAL LETTER A WITH DIAERESIS
Why you have duplicates, I do not know :-)
But in Python 2, that's not what you will get. What you get depends on your
environment, and is unpredictable. For example, on my system, using a Linux
terminal interactively with the terminal set to UTF-8, I get:
py> for c in "ÀÄ": # removing duplicates
... print c, ord(c)
...
� 195
� 128
� 195
� 132
Yes, that's right, I get FOUR (not two) "characters" (actually bytes). But if I
change the terminal settings to, say, ISO-8859-7:
py> for c in "ΓΓ":
... print c, ord(c)
...
Γ 195
128
Γ 195
132
the bytes stay the same (195, 128, 195, 132) but the *meaning* of those bytes
change completely.
So, the point is, if you are running Python 2.7, what you get from a byte string like
"ÀÄ" is unpredictable. What you need is a Unicode string u"ÀÄ", which will
exactly what it looks like.
That's the first issue.
Second issue, you build up a string using this idiom:
zark = ''
for c in something:
zark += c
Even though this works, this is a bad habit to get into and you should avoid
it: it risks being unpredictably slower than continental drift, and in a way
that is *really* hard to diagnose. I've seen a case of this fool the finest
Python core developers for *weeks*, regarding a reported bug where Python was
painfully slow but only for SOME but not all Windows users.
The reason why accumulating strings using + can be slow when there are a lot of
strings is because it is a Shlemiel the painter's algorithm:
http://www.joelonsoftware.com/articles/fog0000000319.html
The reason why sometimes it is *not* slow is that CPython 2.3 and beyond
includes a clever optimization trick which can *sometimes* fix this issue, but
it depends on details for the operating system's memory handling, and of course
it doesn't apply to other implementations like Jython, IronPython, PyPy and
Nuitka.
So do yourself a favour and get out of the habit of accumulating strings in a
for loop using + since it will bite you one day. (Adding one or two strings is
fine.)
Problem number three: you generate characters using this:
unichr(ord(x)-45)
but that gives you a negative number if ord(x) is less than 45, which gives you
exactly the result you see:
py> unichr(-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
(By the way, you're very naughty. The code you show *cannot possibly generate
the error you claim it generates*. Bad Jim, no biscuit!)
I don't understand what the ord(x)-45 is intended to do. The effect is to give
the 45th previous character, e.g. the 45th character before 'n' is 'A'. But
characters below chr(45) don't have anything 45 characters previous, so you
need to rethink what you are trying to do.
--
Steven
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor