[Tutor] Python String and Unicode data types and Encode Decode Functions

2015-12-20 Thread Anshu Kumar
Hi Everyone,

In my current project I am dealing a lot with unicode type. There are some
text files which contain unicode to accommodate data in multiple languages.
I have to continuously parse these files in xml or yaml format using xml
and yaml libraries. I have encountered several errors due to unicode and
have to encode such texts to utf-8 using encode('utf-8') method. Though I
could resolve my issue but could not appreciate the datatypes unicode ,
string, encode and decode methods.

I know certain facts like

1. String is nothing but a byte array so it has only 8 bits to encode
character using ascii, so it should not be used whenever we have characters
from other language thats why a broader type unicode is used.

2. Python internally uses different implementation  to store strings in RAM

3. print function can print both string and unicode because it has some
kind of function overloading.


4. u'' , that is u prefixed before single quotes or double quotes tells
python interpreter that the following type is unicode and not a string.


Now my doubts start

*1. I tried below code and see that japanese characters can be accommodated
in strings. I do not get how is it possible?*

>>> temo = 'いい'
>>> temo
'\xe3\x81\x84\xe3\x81\x84'
>>> print temo
いい
>>> type(temo)

>>>


*2. When i try to iterate over characters i do not get anything meaningful*

for character in temo:
... print character
...

�
�

�
�

*3 . When I do I get length  as 6 *

len(temo)
6

Why so?


*4.  When i try to spit out each character I get below error*

 for character in temo:
... print character.encode('utf-8')
...
Traceback (most recent call last):
  File "", line 2, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0:
ordinal not in range(128)


Now I am not able to appreciate how unicode and string are working in
background with the facts I know. Please help me to understand this magic.

Thanks a lot in advance,
Anshu
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Object oriented design

2015-12-20 Thread Alan Gauld
On 20/12/15 00:48, jamie hu wrote:

>trying to think/implement. I can create a given student object based on
>given firstname, lastname and grade. How do I find all objects matching
>particular criteria and return them to caller? Do I need to iterate/select
>through some list/database and create Student objects again?

You need to store a referejce to each object in some kind of
container - a list or dictionary for example, or maybe a full
blown database.

>class Student():
>* def __init__(self,firstname,lastname,age):
>* * self.firstname = firstname
>* * self.lastname = lastname
>* * self.grade = grade
> 
>* def set_grade(self,grade):
>* * self.grade = grade
> 
>* @classmethod
> 
>* # Find all Students with given lastname
>* def find_by_lastname():
>* * # How do I return all student objects that have same lastname?
>* * # Do I need to call init method again? I am confused here.
>* * pass

You don't need to call init a second time but you do need to put
your instances into a container as you create them. A common way
to do this is to have a class attribute (ie not an instance one)
called _instances or similar and have a line at the end
of __init__() that does

Student._instances.append(self)

Your class method can then traverse the _instances collection
checking each instance until it finds the desired object.

If you have many objects, and especially if they will not
all be instantiated at once you would use a database and
in that case the class method would check the instances
collection first to see if you already had the object
in memory and, if not, instantiate it from the database.

HTH
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python String and Unicode data types and Encode Decode Functions

2015-12-20 Thread Steven D'Aprano
On Sun, Dec 20, 2015 at 08:17:21AM +0530, Anshu Kumar wrote:

> I know certain facts like

What version of Python are you using? My *guess* is that you are using 
Python 2.7, is that correct?

What operating system are you using? Windows, Linux, Mac OS X, Unix, 
something else?


> 1. String is nothing but a byte array so it has only 8 bits to encode
> character using ascii, so it should not be used whenever we have characters
> from other language thats why a broader type unicode is used.

If you are using Python 2, this is correct.

> 2. Python internally uses different implementation  to store strings in RAM

Correct.


> 3. print function can print both string and unicode because it has some
> kind of function overloading.

Mostly correct. `print` can print any object in Python (within reason). 
The details are probably not important.


> 4. u'' , that is u prefixed before single quotes or double quotes tells
> python interpreter that the following type is unicode and not a string.

In Python 2, this is correct.


> Now my doubts start

Unicode is sometimes hard to understand, and unfortunately Python 2 
makes it even harder rather than easier.


> *1. I tried below code and see that japanese characters can be accommodated
> in strings. I do not get how is it possible?*
> 
> >>> temo = 'いい'
> >>> temo
> '\xe3\x81\x84\xe3\x81\x84'
> >>> print temo
> いい
> >>> type(temo)
> 

This appears to be Python 2 code.

The fact that this works is an accident of the terminal/console you are 
using. If you tried it on another computer, using a different OS, you 
might find the results will change or possibly won't work at all.

What seems to be happening is this:

(1) You type, or paste, two Unicode characters into the terminal input 
buffer, namely `HIRAGANA LETTER I` repeated twice.

(2) The terminal is set to use UTF-8 encoding, so it puts the six bytes 
\xe3\x81\x84 \xe3\x81\x84 into the buffer, and displays HIRAGANA LETTER 
I twice.

(3) Python generates a string object containing six bytes (as above).

(4) When you print those six bytes, the terminal recognises this as 
UTF-8, and displays HIRAGANA LETTER I (twice).


This seems to work, but it is an accident. If you change the terminal 
settings, you will see different results:


# terminal using UTF-8 as the encoding
py> s = 'いい'  # looks like HIRAGANA LETTER I twice but actually six bytes
py> print s  # terminal recognises this as UTF-8
いい
py> s  # The actual six bytes.
'\xe3\x81\x84\xe3\x81\x84'

# now I change the terminal to use ISO-8859-7 (Greek) instead.
# the same six bytes now display as a Greek character plus invisible 
# control characters
py> print s  
γγ


# now I change the terminal to use Latin-1 ISO-8859-1
py> print s
ãã


So the results you get are dependent on the terminal's encoding. The 
fact that it happens to work on your computer is a lucky accident.


Instead, you should ensure Python knows to use proper Unicode text:

py> s = u'いい'
py> print s
いい


Provided your terminal is capable of entering the HIRAGANA LETTER I 
character in the first place, then s will ALWAYS be treated as that same 
HIRAGANA LETTER I. (Although, if you change the encoding of the 
terminal, it may print differently. That's not Python's fault -- that's 
the terminal.)

In this case, instead of s being a *byte* string of three bytes, 
\xe3 \x81 \x84, s is a Unicode string of ONE character い. (Double 
everything if you enter the character twice.)


> *2. When i try to iterate over characters i do not get anything meaningful*
> 
> for character in temo:
> ... print character
> ...

You are trying to print the six control characters 

\xe3 \x81 \x84 \xe3 \x81 \x84

What they look like on your system could be anything -- a blank space, 
no space at all, or the "missing character" glyph.


> *3 . When I do I get length  as 6 *
> 
> len(temo)
> 6
> 
> Why so?

Because your terminal has entered each Unicode character as three UTF-8 
bytes; you have two characters, and 2*3 is 6. Hence the byte string is 
length six.

If you use a Unicode string, the length will be two Unicode characters.

Internally, in the computer's RAM, those two characters might be stored 
as any of the following bytes:

# UTF-16 Big Endian
\x30 \x44 \x30 \x44

# UTF-16 Little Endian
\x44 \x30 \x44 \x30

# UTF-32 Big Endian
\x00 \x00 \x30 \x44 \x00 \x00 \x30 \x44

# UTF-32 Little Endian
\x44 \x30 \x00 \x00 \x44 \x30 \x00 \x00

# UTF-8
\xe3 \x81 \x84 \xe3 \x81 \x84

depending on the implementation and version of Python. The internal 
representation isn't very important, the important thing is that Unicode 
strings are treated as sequences of Unicode characters, not as sequences 
of bytes.


> *4.  When i try to spit out each character I get below error*
> 
>  for character in temo:
> ... print character.encode('utf-8')
> ...
> Traceback (most recent call last):
>   File "", line 2, in 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0:
> ordinal not in range(128)


You have

[Tutor] Object oriented design

2015-12-20 Thread jamie hu
   *
   Hi,

   I am starting with Python object oriented concepts and have difficulty in
   understanding object instantiation. Below is an example code that I am
   trying to think/implement. I can create a given student object based on
   given firstname, lastname and grade. How do I find all objects matching
   particular criteria and return them to caller? Do I need to iterate/select
   through some list/database and create Student objects again?

   *

   Sorry that question is not quite clear, but I am confused and not sure how
   to put it in right words.*

   *

   class Student():
   * def __init__(self,firstname,lastname,age):
   * * self.firstname = firstname
   * * self.lastname = lastname
   * * self.grade = grade

   * def set_grade(self,grade):
   * * self.grade = grade

   * @classmethod

   * # Find all Students with given lastname
   * def find_by_lastname():
   * * # How do I return all student objects that have same lastname?
   * * # Do I need to call init method again? I am confused here.
   * * pass

   *

   Thanks,

   Jamie

   *
   *
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] interface

2015-12-20 Thread Alan Gauld
On 20/12/15 02:21, Alex Kleider wrote:

> First I've heard of Tix!
> Much to learn.

A potentially useful set of extra widgets on top of Tkinter.
Unfortunately the Tkinter port of the original Tcl/Tk TIX
package is incomplete and only reliable for about half the
extended widgets (thankfully the most common ones such
as scrolledList and a tabbed Notebook etc).

The biggest disappointment is the Grid widget which
theoretically should work but despite many attempts I've
failed to get anything useful. It's on my todo list to spend
some time either fixing it, or documenting how it works,
or both...

But because Tix is a superset of Tkinter I rarely use
raw Tkinter nowadays I usually just start with Tix.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] interface

2015-12-20 Thread Alex Kleider

On 2015-12-20 06:11, Alan Gauld wrote:

On 20/12/15 02:21, Alex Kleider wrote:


First I've heard of Tix!


A potentially useful set of extra widgets on top of Tkinter.
Unfortunately the Tkinter port of the original Tcl/Tk TIX
package is incomplete and only reliable for about half the
extended widgets (thankfully the most common ones such
as scrolledList and a tabbed Notebook etc).

The biggest disappointment is the Grid widget which
theoretically should work but despite many attempts I've
failed to get anything useful. It's on my todo list to spend
some time either fixing it, or documenting how it works,
or both...

But because Tix is a superset of Tkinter I rarely use
raw Tkinter nowadays I usually just start with Tix.


Thanks for the background insight.
How does Ttk (T for 'themed' I assume) fit in? is it used in addition to 
Tix

(as it would be used in addition to tkinter)
or is it an either/or situation?

Alex
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Object oriented design

2015-12-20 Thread Danny Yoo
On Sat, Dec 19, 2015 at 4:48 PM, jamie hu  wrote:
>I am starting with Python object oriented concepts and have difficulty in
>understanding object instantiation. Below is an example code that I am
>trying to think/implement. I can create a given student object based on
>given firstname, lastname and grade. How do I find all objects matching
>particular criteria and return them to caller? Do I need to iterate/select
>through some list/database and create Student objects again?


In order to find something, that something needs to be accessible from
"somewhere".

In typical beginner programs, that "somewhere" is an in-memory
collection, like a list or dictionary, as Alan suggests.  I'd expect,
for your purposes, that this is an appropriate representation for
"somewhere".

find_by_lastname needs to know about this "somewhere".  You have a few
options.  1.  You can either pass the "somewhere" in as an explicit
parameter, or 2. hardcode it within find_by_lastname's definition.

We expect that #1 will look something like:

#
"""Collection of things."""
STUDENTS = []

...

def find_by_lastname(students, lastname):
"""Contract: listof(Student) string -> Student
Given a list of students and a student's last name,
returns the student with that last name.
"""
   # ... fill me in


## later, we can call find_by_lastname, passing in the
## collection as an explicit argument.
find_by_lastname(STUDENTS, "jamie")
#


And #2 will probably look something like:

#
"""Collection of things."""
STUDENTS = []

...

def find_by_lastname(lastname):
"""Contract: string -> Student
Given a student's last name, returns the student
with that last name, looking through STUDENTS.
"""
   # ... fill me in


## later, we can call find_by_lastname, passing in the
## collection as an explicit argument.
find_by_lastname("jamie")
#


This is a rough sketch.  So, which one do you choose?

I have no idea!

This is a decision point, and one that you need to resolve, because
either choice has its own advantages and tradeoffs.  Assuming this is
an assignment, you need to talk with your instructor to see if there's
one that they had in mind, or if this is something you get to decide.
Personally, I don't like hardcoding, so #1 is my pick, but #2 has its
advantages too: it's easier to call.


Please feel free to ask questions.  Good luck.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] interface

2015-12-20 Thread Alan Gauld
On 20/12/15 20:00, Alex Kleider wrote:

>> But because Tix is a superset of Tkinter I rarely use
>> raw Tkinter nowadays I usually just start with Tix.
> 
> Thanks for the background insight.
> How does Ttk (T for 'themed' I assume) fit in? is it used in addition to 
> Tix

Yes. Tkinter and Tix both share the same look n feel
and are just collections of widgets.

Ttk is a smaller set of widgets but uses the style of the
native toolkit(or at least a much closer approximation
than Tkinter). Personally the look of Tk doesn't offend
me that much so I rarely bother using ttk, but if you
plan on sharing apps with folks who live and breath Windows
or Mac then they might feel more comfortable with
ttk buttons/menus. But it does need an extra
import/reference beyond Tkinter or Tix.


-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor