Wayne Werner wrote:

When you do something like this in C:

int x = 0;
int y = 0;

What you have actually done behind the scenes is allocated two bytes of
memory(IIRC that's in the C spec, but I'm not 100% sure that it's guaranteed
to be two bytes). Perhaps they are near each other, say at addresses
0xab0fcd and 0xab0fce. And in each of these locations the value of 0 is
stored.

The amount of memory will depend on the type of the variable. In C, you have to declare what type the variable will be. The compiler then knows how much space to allocate for it.


When you create a variable, memory is allocated, and you refer to that
location by the variable name, and that variable name always references that
address, at least until it goes out of scope. So if you did something like
this:

x = 4;
y = x;

Then x and y contain the same value, but they don't point to the same
address.

Correct, at least for languages like C or Pascal that have the "memory location" model for variables.


In Python, things are a little bit more ambiguous because everything is an
object.  So if you do this:

No. There is nothing ambiguous about it, it is merely different from C. The rules are completely straightforward and defined exactly.

Also, the fact that Python is object oriented is irrelevant to this question. You could have objects stored and referenced at memory locations, like in C, if the language designer wanted it that way.


x = 4
y = x

Then it's /possible/ (not guaranteed) that y and x point to the same memory
location. You can test this out by using the 'is' operator, which tells you
if the variables reference the same object:

The second half of your sentence is correct, you can test it with the 'is' operator. But the first half is wrong: given the two assignments shown, x=4 and y=x, it *is* guaranteed that x and y will both reference the same object. That is a language promise made by Python: assignment never makes a copy. So if you have

x = 4

and then you do

y = x

the language *promises* that x and y now are names for the same object. That is, "x is y" will return True, or id(x) == id(y).


However, what is not promised is the behaviour of this:

x = 4
y = 4

In this case, you are doing two separate assignments where the right hand side is given by a literal which merely happens to be the same. The compiler is free to either create two separate objects, both with value 4, or just one. In CPython's case, it reuses some small numbers, but not larger ones:

>>> x = 4
>>> y = 4
>>> x is y
True
>>> x = 40000
>>> y = 40000
>>> x is y
False


CPython caches the first 100 integers, I believe, although that will depend on exactly which version of CPython you are using.

The reason for caching small integers is that it is faster to look them up in the cache than to create a new object each time; but the reason for only caching a handful of them is that the cache uses memory, and you wouldn't want billions of integers being saved for a rainy day.


x = 4
y = x
x is y
True

But this is not guaranteed behavior - this particular time, python happened
to cache the value 4 and set x and y to both reference that location.

As I've said, this is guaranteed behaviour, but furthermore, you shouldn't think about objects ("variables") in Python having locations. Of course, in reality they do, since it would be impossible -- or at least amazingly difficult -- to design a programming language without the concept of memory location. But as far as *Python* is concerned, rather than the underlying engine that makes Python go, variables don't have locations in an meaningful sense.

Think of objects in Python as floating in space, rather than lined up in nice rows with memory addresses. From Python code, you can't tell what address an object is at, and if you can, you can't do anything with the knowledge.

Some implementations, such as CPython, expose the address of an object as the id(). But you can't do anything with it, it's just a number. And other implementations, such as Jython and IronPython, don't do that. Every object gets a unique number, starting from 1 and counting up. If an object is deleted, the id doesn't get reused in Jython and IronPython (unlike CPython).

Unlike the C "memory address" model, Python's model is of "name binding". Every object can have zero, one, or more names:

print []  # Here, the list has no name.
x = []  # Here, the list has a single name, "x"
x = y = []  # Here, the list has two names, "x" and "y".

In practice, Python uses a dictionary to map names to objects. That dictionary is exposed to the user using the function globals().

The main differences between "memory location" variables and "name binding" variables are:


(1) Memory locations are known by the compiler at compile-time, but only at run-time for name binding languages. In C-like languages, if I say:

x = 42
print x

the compiler knows to store 42 into location 123456 (say), and then have the print command look at location 123456. But with name-binding, the compiler doesn't know what location 42 will actually end up at until run-time. It might be anything.

(2) Memory location variables must be fixed sizes, while name-binding can allow variables to change size.

(3) Memory location variables must copy on assignment: x = 4; y = x makes a copy of x to store in y, since x and y are different variables and therefore different locations. Name-binding though, gives the language designer a choice to copy or not.



[...]
One thing that is important to note is that in each of these examples, the
data types are immutable. In C++ if you have a string and you add to the end
of that string, that string is still stored in the same location. In Python
there's this magical string space that contains all the possible strings in
existence[1] and when you "modify" a string using addition, what you're
actually doing is telling the interpreter that you want to point to the
string that is the result of addition, like 'hi' + '!'. Sometimes Python
stores these as the same object, other times they're stored as different
objects.

A better way of thinking about this is to say that when you concatenate two strings:

a = "hello"
b = "world"
text = a + b

Python will build a new string on the spot and then bind the name text to this new string.

The same thing happens even if you concatenate a string to an existing string, like this:

text = "hello"
text = text + "world"

Python looks at the length of the existing two strings: 5 and 5, allocates enough space for 10 letters, then copies letter-by-letter into the new string.

However, this can be slow for big strings, so CPython (but not Jython and IronPython) have an optimization that can *sometimes* apply. If there is only one reference to "hello", and you are concatenating to the end, then CPython can sneakily re-use the space already there by expanding the first string, then copying into the end of it. But this is an implementation-dependent trick, and not something you can rely on.



--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to