Printing characters outside of the ASCII range

2012-11-09 Thread danielk
I'm converting an application to Python 3. The app works fine on Python 2.

Simply put, this simple one-liner:

print(chr(254))

errors out with:

Traceback (most recent call last):
  File "D:\home\python\tst.py", line 1, in 
print(chr(254))
  File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 
0: character maps to 

I'm using this character as a delimiter in my application.

What do I have to do to convert this string so that it does not error out?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Printing characters outside of the ASCII range

2012-11-09 Thread danielk
On Friday, November 9, 2012 12:48:05 PM UTC-5, Dave Angel wrote:
> On 11/09/2012 12:17 PM, danielk wrote:
> 
> > I'm converting an application to Python 3. The app works fine on Python 2.
> 
> >
> 
> > Simply put, this simple one-liner:
> 
> >
> 
> > print(chr(254))
> 
> >
> 
> > errors out with:
> 
> >
> 
> > Traceback (most recent call last):
> 
> >   File "D:\home\python\tst.py", line 1, in 
> 
> > print(chr(254))
> 
> >   File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
> 
> > return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> 
> > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in 
> > position 0: character maps to 
> 
> >
> 
> > I'm using this character as a delimiter in my application.
> 
> >
> 
> > What do I have to do to convert this string so that it does not error out?
> 
> 
> 
> What character do you want?  What characters do your console handle
> 
> directly?  What does a "delimiter" mean for your particular console?
> 
> 
> 
> Or are you just printing it for the fun of it, and the real purpose is
> 
> for further processing, which will not go to the console?
> 
> 
> 
> What kind of things will it be separating?  (strings, bytes ?)  Clearly
> 
> you originally picked it as something unlikely to occur in those elements.
> 
> 
> 
> When those things are combined with a separator between, how are the
> 
> results going to be used?  Saved to a file?  Printed to console?  What?
> 
> 
> 
> -- 
> 
> 
> 
> DaveA

The database I'm using stores information as a 3-dimensional array. The 
delimiters between elements are chr(252), chr(253) and chr(254). So a record 
can look like this (example only uses one of the delimiters for simplicity):

name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip

The other delimiters can be embedded within each field. For example, if there 
were multiple addresses for 'name' then the 'address' field would look like 
this:

addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...

I use Python to connect to the database using subprocess.Popen to run a server 
process. Python requests 'actions' like 'read' and 'write' to the server 
process, whereby the server process performs the actions. Some actions require 
that the server send back information in the form of records that contain those 
delimiters.

I have __str__ and __repr__ methods in the classes but Python is choking on 
those characters. Surely, I could convert those characters on the server before 
sending them to Python and that is what I'm probably going to do, so guess I've 
answered my own question. On Python 2, it just printed the 'extended' ASCII 
representation.

I guess the question I have is: How do you tell Python to use a specific 
encoding for 'print' statements when I know there will be characters outside of 
the ASCII range of 0-127?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Printing characters outside of the ASCII range

2012-11-09 Thread danielk
On Friday, November 9, 2012 4:34:19 PM UTC-5, Prasad, Ramit wrote:
> danielk wrote:
> 
> > 
> 
> > The database I'm using stores information as a 3-dimensional array. The 
> > delimiters between elements are
> 
> > chr(252), chr(253) and chr(254). So a record can look like this (example 
> > only uses one of the delimiters for
> 
> > simplicity):
> 
> > 
> 
> > name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip
> 
> > 
> 
> > The other delimiters can be embedded within each field. For example, if 
> > there were multiple addresses for 'name'
> 
> > then the 'address' field would look like this:
> 
> > 
> 
> > addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...
> 
> > 
> 
> > I use Python to connect to the database using subprocess.Popen to run a 
> > server process. Python requests
> 
> > 'actions' like 'read' and 'write' to the server process, whereby the server 
> > process performs the actions. Some
> 
> > actions require that the server send back information in the form of 
> > records that contain those delimiters.
> 
> > 
> 
> > I have __str__ and __repr__ methods in the classes but Python is choking on 
> > those characters. Surely, I could
> 
> > convert those characters on the server before sending them to Python and 
> > that is what I'm probably going to do,
> 
> > so guess I've answered my own question. On Python 2, it just printed the 
> > 'extended' ASCII representation.
> 
> > 
> 
> > I guess the question I have is: How do you tell Python to use a specific 
> > encoding for 'print' statements when I
> 
> > know there will be characters outside of the ASCII range of 0-127?
> 
> 
> 
> You just need to change the string to one that is not 
> 
> trying to use the ASCII codec when printing. 
> 
> 
> 
> print(chr(253).decode('latin1')) # change latin1 to your 
> 
>  # chosen encoding.
> 
> ý
> 
> 
> 
> 
> 
> ~Ramit
> 
> 
> 
> 
> 
> This email is confidential and subject to important disclaimers and
> 
> conditions including on offers for the purchase or sale of
> 
> securities, accuracy and completeness of information, viruses,
> 
> confidentiality, legal privilege, and legal entity disclaimers,
> 
> available at http://www.jpmorgan.com/pages/disclosures/email.

D:\home\python>pytest.py
Traceback (most recent call last):
  File "D:\home\python\pytest.py", line 1, in 
print(chr(253).decode('latin1'))
AttributeError: 'str' object has no attribute 'decode'

Do I need to import something?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Printing characters outside of the ASCII range

2012-11-11 Thread danielk
On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
> On Fri, Nov 9, 2012 at 2:46 PM, danielk  wrote:
> 
> > D:\home\python>pytest.py
> 
> > Traceback (most recent call last):
> 
> >   File "D:\home\python\pytest.py", line 1, in 
> 
> > print(chr(253).decode('latin1'))
> 
> > AttributeError: 'str' object has no attribute 'decode'
> 
> >
> 
> > Do I need to import something?
> 
> 
> 
> Ramit should have written "encode", not "decode".  But the above still
> 
> would not work, because chr(253) gives you the character at *Unicode*
> 
> code point 253, not the character with CP437 ordinal 253 that your
> 
> terminal can actually print.  The Unicode equivalents of those
> 
> characters are:
> 
> 
> 
> >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))
> 
> [8319, 178, 9632]
> 
> 
> 
> So these are what you would need to encode to CP437 for printing.
> 
> 
> 
> >>> print(chr(8319))
> 
> ⁿ
> 
> >>> print(chr(178))
> 
> ²
> 
> >>> print(chr(9632))
> 
> ■
> 
> 
> 
> That's probably not the way you want to go about printing them,
> 
> though, unless you mean to be inserting them manually.  Is the data
> 
> you get from your database a string, or a bytes object?  If the
> 
> former, just do:
> 
> 
> 
> print(data.encode('cp437'))
> 
> 
> 
> If the latter, then it should be printable as is, unless it is in some
> 
> other encoding than CP437.

Ian's solution gives me what I need (thanks Ian!). But I notice a difference 
between '__str__' and '__repr__'.

class Pytest(str):
def __init__(self, data = None):
if data == None: data = ""
self.data = data

def __repr__(self):
return (self.data).encode('cp437')

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
abc²def
>>> print(p.data)
abc²def
>>> print(type(p.data))


If I change '__repr__' to '__str__' then I get:

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: __str__ returned non-string (type bytes)

Why is '__str__' behaving differently than '__repr__' ? I'd like to be able to 
use '__str__' because the result is not executable code, it's just a string of 
the record contents.

The documentation for the 'encode' method says: "Return an encoded version of 
the string as a bytes object." Yet when I displayed the type, it said it was 
, which I'm taking to be 'type string', or can a 'string' also be 
'a string of bytes' ?

I'm trying to get my head around all this codecs/unicode stuff. I haven't had 
to deal with it until now but I'm determined to not let it get the best of me 
:-)

My goals are:

a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, 
if they know the record format then they should be able to create a database 
object like it does above, but with the chr(25x) characters. I will handle the 
conversion of the chr(25x) characters internally.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Encoding conundrum

2012-11-21 Thread danielk
On Tuesday, November 20, 2012 6:03:47 PM UTC-5, Ian wrote:
> On Tue, Nov 20, 2012 at 2:49 PM, Daniel Klein  wrote:
> 
> > With the assistance of this group I am understanding unicode encoding issues
> 
> > much better; especially when handling special characters that are outside of
> 
> > the ASCII range. I've got my application working perfectly now :-)
> 
> >
> 
> > However, I am still confused as to why I can only use one specific encoding.
> 
> >
> 
> > I've done some research and it appears that I should be able to use any of
> 
> > the following codecs with codepoints '\xfc' (chr(252)) '\xfd' (chr(253)) and
> 
> > '\xfe' (chr(254)) :
> 
> 
> 
> These refer to the characters with *Unicode* codepoints 252, 253, and 254:
> 
> 
> 
> >>> unicodedata.name('\xfc')
> 
> 'LATIN SMALL LETTER U WITH DIAERESIS'
> 
> >>> unicodedata.name('\xfd')
> 
> 'LATIN SMALL LETTER Y WITH ACUTE'
> 
> >>> unicodedata.name('\xfe')
> 
> 'LATIN SMALL LETTER THORN'
> 
> 
> 
> > ISO-8859-1   [ note that I'm using this codec on my Linux box ]
> 
> 
> 
> For ISO 8859-1, these characters happen to exist and even correspond
> 
> to the same ordinals: 252, 253, and 254 (this is by design); so there
> 
> is no problem encoding them, and the resulting bytes even happen to
> 
> match the codepoints of the characters.
> 
> 
> 
> > cp1252
> 
> 
> 
> cp1252 is designed after ISO 8859-1 and also has those same three characters:
> 
> 
> 
> >>> for char in b'\xfc\xfd\xfe'.decode('cp1252'):
> 
> ... print(unicodedata.name(char))
> 
> ...
> 
> LATIN SMALL LETTER U WITH DIAERESIS
> 
> LATIN SMALL LETTER Y WITH ACUTE
> 
> LATIN SMALL LETTER THORN
> 
> 
> 
> > latin1
> 
> 
> 
> Latin-1 is just another name for ISO 8859-1.
> 
> 
> 
> > utf-8
> 
> 
> 
> UTF-8 is a *multi-byte* encoding.  It can encode any Unicode
> 
> characters, so you can represent those three characters in UTF-8, but
> 
> with a different (and longer) byte sequence:
> 
> 
> 
> >>> print('\xfc\xfd\xfd'.encode('utf8'))
> 
> b'\xc3\xbc\xc3\xbd\xc3\xbd'
> 
> 
> 
> > cp437
> 
> 
> 
> cp437 is another 8-bit encoding, but it maps entirely different
> 
> characters to those three bytes:
> 
> 
> 
> >>> for char in b'\xfc\xfd\xfe'.decode('cp437'):
> 
> ... print(unicodedata.name(char))
> 
> ...
> 
> SUPERSCRIPT LATIN SMALL LETTER N
> 
> SUPERSCRIPT TWO
> 
> BLACK SQUARE
> 
> 
> 
> As it happens, the character at codepoint 252 (that's LATIN SMALL
> 
> LETTER U WITH DIAERESIS) does exist in cp437.  It maps to the byte
> 
> 0x81:
> 
> 
> 
> >>> '\xfc'.encode('cp437')
> 
> b'\x81'
> 
> 
> 
> The other two Unicode characters, at codepoints 253 and 254, do not
> 
> exist at all in cp437 and cannot be encoded.
> 
> 
> 
> > If I'm not mistaken, all of these codecs can handle the complete 8bit
> 
> > character set.
> 
> 
> 
> There is no "complete 8bit character set".  cp1252, Latin1, and cp437
> 
> are all 8-bit character sets, but they're *different* 8-bit character
> 
> sets with only partial overlap.
> 
> 
> 
> > However, on Windows 7, I am only able to use 'cp437' to display (print) data
> 
> > with those characters in Python. If I use any other encoding, Windows laughs
> 
> > at me with this error message:
> 
> >
> 
> >   File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
> 
> > return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> 
> > UnicodeEncodeError: 'charmap' codec can't encode character '\xfd' in
> 
> > position 3: character maps to 
> 
> 
> 
> It would be helpful to see the code you're running that causes this error.

I'm using subprocess.Popen to run a process that sends a list of codepoints to 
the calling Python program. The list is sent to stdout as a string.  Here is a 
simple example that encodes the string "Dead^Parrot", where (for this example) 
I'm using '^' to represent chr(254) :

encoded_string = '[68,101,97,100,254,80,97,114,114,111,116]'

This in turn is handled in __repr__ with:

return bytes((eval(encoded_string))).decode('cp437')

I get the aforementioned 'error' if I use any other encoding.

> 
> 
> 
> > Furthermore I get this from IDLE:
> 
> >
> 
>  import locale
> 
>  locale.getdefaultlocale()
> 
> > ('en_US', 'cp1252')
> 
> >
> 
> > I also get 'cp1252' when running the same script from a Windows command
> 
> > prompt.
> 
> >
> 
> > So there is a contradiction between the error message and the default
> 
> > encoding.
> 
> 
> 
> If you're printing to stdout, it's going to use the encoding
> 
> associated with stdout, which does not necessarily have anything to do
> 
> with the default locale.  Use this to determine what character set you
> 
> need to be working in if you want your data to be printable:
> 
> 
> 
> >>> import sys
> 
> >>> sys.stdout.encoding
> 
> 'cp437'
> 

Hmmm. So THAT'S why I am only able to use 'cp437'. I had (mistakenly) thought 
that I could just indicate whatever encoding I wanted, as long as the codec 
supported it.

> 
> 
> > Why am I restricted from using just that one codec? Is this a Windows or
>

An object is and isn't an instance of a class at the same time

2012-12-09 Thread danielk
I was debugging some code using isinstance() to make sure the correct object 
was been passed to the method and came across something that is really ticking 
me off. 

I have a class called 'Jitem' in its own file called 'jitem.py'. It's part of a 
package called 'jukebox'. I also have '__all__' that includes 'jitem' so that I 
can do:

from jukebox import *

There is another class that has a method that does this (simplified for this 
example):

def execute(self, command):

I stuck this debug code in the method:

if not isinstance(command, jitem.Jitem):
print(command.__class__)
raise TypeError("Command must be an instance of Jitem.")

When this method gets run in a test script, it returns this:

D:\home\python>python jtest.py

Traceback (most recent call last):
  File "jtest.py", line 4, in 
executeResults = jc.execute(cmnd)
  File "D:\home\python\jukebox\jconnection.py", line 225, in execute
raise TypeError("Command must be an instance of Jitem.")
TypeError: Command must be an instance of Jitem.

How can it both get past isinstance() and still say it is the proper class?

Dan Klein
-- 
http://mail.python.org/mailman/listinfo/python-list