This is another testbed for prototyping the lightweight dynamic translation
scheme in plex86.

You can tweak parameters in the top of 'dt.h' before compiling.
I will be adding more parameters and modeling to this code over time.
Then just type 'make'.

Guest code is actually generated in 'guest.c'.  I will also
be adding different kinds of guest code sequences to that file.

The first rev of this guest code exercises static out-of-page
branches in nearly the worst possible scenario.  I generate
code which weaves a path branching back and forth between two
arbitrary guest pages.  At the end, based on 'DT_LOOP_COUNT' this
process is repeated.

I did this purposely to magnify the overhead imposed by the branch
handling shims and code as much as possible, so that we can work
on trimming them down by trying various techniques.  This is not
anywhere near realistic for code, but it's useful for development.

Some results thus far, based on the command 'time ./dt' are below.
Once this is trimmed down, I may model some more realistic guest
code.

  Native: 0.72 seconds


   DT    Slowdown factor
 54.68s  76:1              Wed Jan 10 17:04:06 EST 2001
                           First effort.  Shims require that DT
                           code be constrained to a given SS, so
                           that SS can be used to save state.  All
                           branches vectored through handler routine.

 46.90s  65:1              Fri Jan 12 00:29:15 EST 2001
                           Shims assume GS is virtualized and used as a
                           data segment for the tcode.  This lets us
                           save guest state more efficiently.  Branches
                           still going through handler routine.

  5.22s   7:1              Added dynamic direct branch backpatching.
                           The handler routines patch in the address
                           of the tcode at the branch target address
                           dynamically.  The generated code first examines
                           an inline ID token and compares it to a global
                           ID token, which increments for each context
                           switch (change of the page tables etc).
                           As long as there is a match, the direct branch
                           is taken.  This is a simple and fairly efficient
                           way to have branches revalidate direct linkage
                           to other tcode sequences across context switches,
                           while maintaining no branch tables of any kind.


Considering that the pseudo-guest code is doing nothing but
thrashing with out-of-page branches (weaving back and forth),
and there are no in-page branches (which would be 1:1) or tight-loop
code cycles, and the pipelines are never allowed to fill, the
last number is pretty good.  This is an exercise of pure overhead.

Essentially each static out-of-page branch only uses the branch handler
routines once after every context switch, where a new binding is
built.  For fun, I may add a signal handler and timer interrupt
to increment the context switch token.

Now, to play with dynamic (computed) branches.  If branch target
lookups can be done with reasonable efficiency, I think we have
something.

One thing that comes to mind is that I wonder with a limited
code cache (however big we end up making it), how the code
paths which are infrequently used will compete with ones
which are very active, for space.  Our translation will be quite
lightweight, so retranslating is not as bad as it could be,
but perhaps it will pay to associate a hit-count with
each code page, and strictly emulate within that page until
a certain threshold is reached, and/or inline similar instrumentation
in the tcode.

Anyways, I like how things look so far...

-Kevin