Re: [plex86] Performance enhancement: elminiating mode and contextswitches

Kevin Lawton Mon, 18 Dec 2000 06:26:04 -0800
Ramon van Handel wrote:

> > Once you modify the instructions in a page by extending the size
> > of an instruction (changing an IO to a call), as opposed to
> > inserting an INT3 (always 1 byte), we have to move from our notion
> > of simple modified cache pages to a more dynamic translation like
> > scheme.  The branch offsets change etc.
> 
> No, not necessarily.  What you do is overwrite the next instruction and
> keep the original in a branch table.  You use a call to go to the
> emulation routine; in stead of using ret, however, the emulation routine
> will look in the branch table, which contains (1) the next instructions to
> be executed, and (2) the address of the first instruction that was not
> overwritten.

Sounds good.  I think this has good potential for virtualizing branch
instructions.  I see what you mean about virtualizing other instructions
which are less than 5 bytes.  Stepping on downstream instructions
means either generating dynamic code for arbitrary instructions, or
accessing emulation code.  The first option is much work.  The
second option is not so good from a run-it-in-ring3 perspective.

We'd have to work out some issues with this method, including
placing handling code somewhere in the guest CS segment range,
generating PIC code for the receiving functions, etc.

Though here's one of the optimizations I had in mind.  What do
you think of this?

  Given when we are prescanning (which would be on
  for guest kernel code) and running guest code at the safety of ring3,
  perhaps we could create a special ring3 code segment which contains
  the handling code to emulate branch and other instructions, and
  a ring3 interrupt gate in the monitor IDT which allows for that handling
  code to deal with certain interrupts.  One natural IDT slot would
  be for the INT3 instruction, which is what we already plant on
  virtualized instructions.  A ring3 handler could look up the
  real instruction, and either emulate if it is simple, or defer
  to the monitor at ring0 if not.  Branch instruction could be emulated
  here.  Perhaps reads of segment selectors, other simple instructions.
  Complicated instructions handled by the emulation mechanisms we have already
  in the monitor, in which case we would have to step up to ring0.

  Anyways, the idea is to spend less time getting to the handler, saving
  less guest state, etc.  I'm not sure how much gain there is in this
  in the end, after all the necessary framework is put into it.

Though, I do like the more simple Ramon insert-call technique, especially
because the call target code can be customized for a given
guest instruction and thus quite simple.  It's worth looking at
the specifics more.


-Kevin

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Kevin Lawton                        [EMAIL PROTECTED]
MandrakeSoft, Inc.                  Plex86 developer
http://www.linux-mandrake.com/      http://www.plex86.org/
Re: [plex86] Performance enhancement: elminiating mode and contextswitches

Reply via email to