[Python-Dev] Python VM

2008-07-21 Thread Jakob Sievers
Hi,
I've been reading the Python VM sources over the last few afternoons and
I took some notes, which I thought I'd share (and if anyone familiar
with the VM internals could have a quick look at them, I'd really
appreciate it).

Cheers,
-jakob


Unless otherwise noted, the source file in question is Python/ceval.c.

Control Flow

The calling sequence is:
main() (in python.c) -> Py_Main() (main.c) -> PyRun_FooFlags() (pythonrun.c) ->
run_bar() (pythonrun.c) -> PyEval_EvalCode() (ceval.c) -> PyEval_EvalCodeEx()
(ceval.c) -> PyEval_EvalFrameEx() (ceval.c).

EvalCodeEx() does some initialization (creating a new execution frame,
argument processing, and some generator-specific stuff I haven't looked at yet)
before calling EvalFrameEx() which contains the main interpreter loop.


Threads
===
PyEval_InitThreads() initializes the GIL (interpreter_lock) and sets
main_thread to the (threading package dependent) ID of the current thread.
Thread switching is done using PyThreadState_Swap(), which sets
_PyThreadState_Current (both defined in pystate.c) and PyThreadState_GET()
(an alias for _PyThreadState_Current) (pystate.h).


Async Callbacks
===
Asynchronous callbacks can be registered by adding the function to be called
to pendingcalls[] (see Py_AddPendingCall()). The state of this queue is
communicated to the main loop via things_to_do.


State
=
The global state is recorded in a (per-process?) PyInterpreterState struct and
a per-thread PyThreadState struct.
Each execution frame's state is contained in that frame's PyFrameObject
(which includes the instruction stream, the environment (globals, locals,
builtins, etc.), the value stack and so forth).
EvalFrameEx()'s local variables are initialized from this frame object.


Instruction Stream
==
The instruction stream looks as follows (c.f. assemble_emit() in compile.c):
A byte stream where each instruction consists of either
1) a single byte opcode: OP
2) a single byte opcode plus a two-byte immediate argument: OP LO HI
3) a special opcode followed by the real opcode followed by a four byte
   argument: EXTENDED_ARG OP BYTE2 BYTE1 BYTE4 BYTE3


Opcode Prediction
=
One nice trick used to speed up opcode dispatch is the following:
Using the macros PREDICT() and PREDICTED() it is sometimes possible
to jump directly to the code implementing the next instruction
rather than having to go through the whole loop preamble, e.g.

case FOO:
  // ...
  PREDICT(BAR);
  continue;

PREDICTED(BAR);
case BAR:
 // ...

expands to

case FOO:
  // ...
  if (*next_instr == BAR) goto PRED_BAR;
  continue;

PRED_BAR: next_instr++;
case BAR:
  // ...


Main Loop
=

Variables and macros used in EvalFrameEx()
--
The value stack:
  PyObject **stack_pointer;
The instruction stream:
  unsigned char *next_instr;
NEXTOP(), NEXTARG(), PEEKARG(), JUMPTO(), and JUMPBY() simply fiddle
with next_instr. Likewise for TOP(), SET_SECOND(), PUSH(), POP(),
etc. and stack_pointer.

Current opcode plus argument:
  int opcode;
  int oparg;

Error status:
  enum why_code why; // no, exn, exn re-raised, return, break, continue, yield
  int err;   // non-zero is error

The environment:
  PyObject *names;
  PyObject *consts;
and
  PyObject **fastlocals;
which is accessed via GETLOCAL() and SETLOCAL().

Finally, there are some more PyObject *'s (v, w, u, and so forth, used
as temporary variables) as well as
  PyObject *retval;


Basic structure
---
EvalFrameEx() {
why = WHY_NOT;
err = 0;

for (;;) {<--+---+
// do periodic tasks |   |
 |   |
fast_next_opcode:|   |
opcode = NEXTOP();   |   |
if (HAS_ARG(opcode)) |   |
oparg = NEXTARG();   |   |
 |   |
dispatch_opcode: |   |
switch(opcode) { |   |
 |   |
continue; ---+   |
 |
break; --+   |
 |   |
// Also, opcode prediction   |   |
// jumps around inside the   |   |
// switch statement  |   |
 |   |
}<---+   |
 |
on_error:|
// no error: continue ---+
// otherwise why == WHY_EXCEPTION after this

fast_block_end:
// unwind stacks if there was an error
}

// more unwinding

fast_yield:
// reset current thread's exception info
exit_eval_frame:
// set thread's execution frame to previous execution frame
return retval;
}

Periodic Tasks
--
By checking and decrementing _Py_Ticker, the main loop 

Re: [Python-Dev] Python VM

2008-07-22 Thread Jakob Sievers
I added a page to wiki.python.org:
  http://wiki.python.org/moin/CPythonVmInternals
This incorporates most of  Martin v. Löwis's additions (except for a
few bits which
I need to look into more).

In any case, thanks for the feedback!
Cheers,
-jakob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ANN] VPython 0.1

2008-10-24 Thread Jakob Sievers
[EMAIL PROTECTED] writes:

> BTW, as to the implementation of individual VM instructions I don't believe
> the Vmgen stuff affects that.  It's just the way the instructions are
> assembled.

Vmgen handles the pushing and popping as well. E.g. ROT_THREE becomes:

  rot_three ( a1 a2 a3 -- a3 a1 a2 )

BINARY_POWER is:

  binary_power ( a1 a2 -- a  dec:a1 dec:a2  next:a )
  a = PyNumber_Power(a1, a2, Py_None);

(Here I have abused Vmgen a bit by declaring, in addition to the actual
value stack, some dummy stacks with different stack prefixes and using
the ``push'' instructions generated for those to do reference
counting.)

I should mention that some of the more involved instructions have no
declared effect (i.e. ( -- ) ) with stack manipulation still being
done by hand. 

Cheers,
-jakob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ANN] VPython 0.1

2008-10-24 Thread Jakob Sievers
[EMAIL PROTECTED] writes:

> Guido> This is very interesting (at this point I'm just lurking), but
> Guido> has anyone pointed out yet that there already is something else
> Guido> called VPython, which has a long standing "right" to the name?
>
> I believe Jakob has already been notified about this.  How about TPython?  A
> quick google-check suggests that while there is at least one instance of
> that name in use as related to Python, it seems to be fairly obscure and is
> perhaps only used internally at CERN.
>

TPython it is!

Cheers,
-jakob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ANN] VPython 0.1

2008-10-26 Thread Jakob Sievers
"Phillip J. Eby" <[EMAIL PROTECTED]> writes:

> At 10:47 AM 10/24/2008 +0200, J. Sievers wrote:
>>  - Right now, CPython's bytecode is translated to direct threaded code
>>  lazily (when a code object is first evaluated). This would have to
>>  be merged into compile.c in some way plus some assorted minor changes.
>
> Don't you mean codeobject.c?  I don't see how the compiler relates, as
> Python programs can generate or transform bytecode.  (For example,
> Zope's Python sandboxing works that way.)
>

Also good :).
(I was thinking about the superinstruction selection code which should
perhaps go into optimize_code() since it's a kind of peephole
optimization. The bytecodes->addresses part might even stay in ceval.c
I guess).

-jakob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com