Re: [Tutor] How Python handles data (was guess-my-number programme)

Steven D'Aprano Tue, 27 Sep 2011 09:01:04 -0700

Wayne Werner wrote:

When you do something like this in C:


int x = 0;
int y = 0;

What you have actually done behind the scenes is allocated two bytes of
memory(IIRC that's in the C spec, but I'm not 100% sure that it's guaranteed
to be two bytes). Perhaps they are near each other, say at addresses
0xab0fcd and 0xab0fce. And in each of these locations the value of 0 is
stored.

The amount of memory will depend on the type of the variable. In C, youhave to declare what type the variable will be. The compiler then knowshow much space to allocate for it.

When you create a variable, memory is allocated, and you refer to that
location by the variable name, and that variable name always references that
address, at least until it goes out of scope. So if you did something like
this:

x = 4;
y = x;

Then x and y contain the same value, but they don't point to the same
address.

Correct, at least for languages like C or Pascal that have the "memorylocation" model for variables.

In Python, things are a little bit more ambiguous because everything is an
object.  So if you do this:

No. There is nothing ambiguous about it, it is merely different from C.The rules are completely straightforward and defined exactly.

Also, the fact that Python is object oriented is irrelevant to thisquestion. You could have objects stored and referenced at memorylocations, like in C, if the language designer wanted it that way.

x = 4
y = x

Then it's /possible/ (not guaranteed) that y and x point to the same memory
location. You can test this out by using the 'is' operator, which tells you
if the variables reference the same object:

The second half of your sentence is correct, you can test it with the'is' operator. But the first half is wrong: given the two assignmentsshown, x=4 and y=x, it *is* guaranteed that x and y will both referencethe same object. That is a language promise made by Python: assignmentnever makes a copy. So if you have


x = 4

and then you do

y = x

the language *promises* that x and y now are names for the same object.That is, "x is y" will return True, or id(x) == id(y).



However, what is not promised is the behaviour of this:

x = 4
y = 4

In this case, you are doing two separate assignments where the righthand side is given by a literal which merely happens to be the same. Thecompiler is free to either create two separate objects, both with value4, or just one. In CPython's case, it reuses some small numbers, but notlarger ones:


>>> x = 4
>>> y = 4
>>> x is y
True
>>> x = 40000
>>> y = 40000
>>> x is y
False

CPython caches the first 100 integers, I believe, although that willdepend on exactly which version of CPython you are using.

The reason for caching small integers is that it is faster to look themup in the cache than to create a new object each time; but the reasonfor only caching a handful of them is that the cache uses memory, andyou wouldn't want billions of integers being saved for a rainy day.

x = 4
y = x
x is y

True

But this is not guaranteed behavior - this particular time, python happened
to cache the value 4 and set x and y to both reference that location.

As I've said, this is guaranteed behaviour, but furthermore, youshouldn't think about objects ("variables") in Python having locations.Of course, in reality they do, since it would be impossible -- or atleast amazingly difficult -- to design a programming language withoutthe concept of memory location. But as far as *Python* is concerned,rather than the underlying engine that makes Python go, variables don'thave locations in an meaningful sense.

Think of objects in Python as floating in space, rather than lined up innice rows with memory addresses. From Python code, you can't tell whataddress an object is at, and if you can, you can't do anything with theknowledge.

Some implementations, such as CPython, expose the address of an objectas the id(). But you can't do anything with it, it's just a number. Andother implementations, such as Jython and IronPython, don't do that.Every object gets a unique number, starting from 1 and counting up. Ifan object is deleted, the id doesn't get reused in Jython and IronPython(unlike CPython).

Unlike the C "memory address" model, Python's model is of "namebinding". Every object can have zero, one, or more names:


print []  # Here, the list has no name.
x = []  # Here, the list has a single name, "x"
x = y = []  # Here, the list has two names, "x" and "y".

In practice, Python uses a dictionary to map names to objects. Thatdictionary is exposed to the user using the function globals().

The main differences between "memory location" variables and "namebinding" variables are:

(1) Memory locations are known by the compiler at compile-time, but onlyat run-time for name binding languages. In C-like languages, if I say:


x = 42
print x

the compiler knows to store 42 into location 123456 (say), and then havethe print command look at location 123456. But with name-binding, thecompiler doesn't know what location 42 will actually end up at untilrun-time. It might be anything.

(2) Memory location variables must be fixed sizes, while name-bindingcan allow variables to change size.

(3) Memory location variables must copy on assignment: x = 4; y = xmakes a copy of x to store in y, since x and y are different variablesand therefore different locations. Name-binding though, gives thelanguage designer a choice to copy or not.




[...]

One thing that is important to note is that in each of these examples, the
data types are immutable. In C++ if you have a string and you add to the end
of that string, that string is still stored in the same location. In Python
there's this magical string space that contains all the possible strings in
existence[1] and when you "modify" a string using addition, what you're
actually doing is telling the interpreter that you want to point to the
string that is the result of addition, like 'hi' + '!'. Sometimes Python
stores these as the same object, other times they're stored as different
objects.

A better way of thinking about this is to say that when you concatenatetwo strings:


a = "hello"
b = "world"
text = a + b

Python will build a new string on the spot and then bind the name textto this new string.

The same thing happens even if you concatenate a string to an existingstring, like this:


text = "hello"
text = text + "world"

Python looks at the length of the existing two strings: 5 and 5,allocates enough space for 10 letters, then copies letter-by-letter intothe new string.

However, this can be slow for big strings, so CPython (but not Jythonand IronPython) have an optimization that can *sometimes* apply. Ifthere is only one reference to "hello", and you are concatenating to theend, then CPython can sneakily re-use the space already there byexpanding the first string, then copying into the end of it. But this isan implementation-dependent trick, and not something you can rely on.




--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] How Python handles data (was guess-my-number programme)

Reply via email to