On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:
> This option allows pausing QEMU in the new RUN_STATE_PRECONFIG state,
> allowing the configuration of QEMU from QMP before the machine jumps
> into board initialization code of machine_run_board_init()
> 
> Intent is to allow management to query machine state and additionally
> configure it using previous query results within one QEMU instance
> (i.e. eliminate need to start QEMU twice, 1st to query board specific
> parameters and 2nd for actual VM start using query results for
> additional parameters).
> 
> New option complements -S option and could be used with or without
> it. Difference is that -S pauses QEMU when machine is completely
> build with all devices wired up and ready run (QEMU need only to
> unpause CPUs to let guest execute its code).
> And "preconfig" option pauses QEMU early before board specific init
> callback (machine_run_board_init) is executed and will allow to
> configure machine parameters which will be used by board init code.
> 
> When early introspection/configuration is done, command 'cont' should
> be used to exit RUN_STATE_PRECONFIG and transition to the next
> requested state (i.e. if -S is used then QEMU will pause the second
> time when board/device initialization is completed or start guest
> execution if -S isn't provided on CLI)
> 
> PS:
> Initially 'preconfig' is planned to be used for configuring numa
> topology depending on board specified possible cpus layout.
> 
> Signed-off-by: Igor Mammedov <imamm...@redhat.com>

TL;DR: I was against this approach of adding a new "preconfig"
state and thought "-S" ought to be enough, but I'm now convinced
this is the best option we have.


Long version:

So, I was skeptical of this approach initially, because I thought
"machine->init() was run" and "machine->init() was not run yet"
is supposed to be internal QEMU state that no external component
should care about at all, because the vCPUs are not running yet.

In other words, if vCPUS were not started yet, we should be able
to reconfigure anything, and "-S" ought to be enough to what we
want.

...in theory.  In practice this is messy:

Currently initialization works this way:

  void vm_start()  /* this is delayed if -S is used */
  {
      resume_all_vcpus();
  }

  void qmp_cont()  /* "cont" command */
  {
      /* ... */
      vm_start();
  }
  
  void main()
  {
      /* ... */
      machine_run_board_init()
      if (autostart) {  /* -S option sets autotstart = 0 */
          vm_start();
      }
      main_loop();  /* QMP becomes available here */
  }

Then we would have to either do this:

  void vm_start()
  {
      machine_run_board_init()  /* <---- HERE */
      resume_all_vcpus();
  }

  void main()
  {
      /* ... */
      /* machine_run_board_init() moved from here */
      if (autostart) {
          vm_start();
      }
      main_loop();
  }

...and fix every single QMP command to not break if
machine_run_board_init() wasn't called yet.

I don't think that's feasible.


Or we could do this:

  void vm_start()
  {
      configure_numa()  /* <---- HERE */
      resume_all_vcpus();
  }

  void main()
  {
      /* ... */
      machine_run_board_init();
      if (autostart) {
          vm_start();
      }
      main_loop();
  }

...and slowly move code from machine_run_board_init() to
vm_start() (like configure_numa() above).

That's how I expected us to implement the NUMA QMP configuration
stuff.

But, really, the data and ordering dependencies we have in
machine initialization is insane, and simply moving
configure_numa() after machine_run_board_init() would require
moving almost all of machine_run_board_init() inside vm_start().

In practice this would be more complex than moving
machine_run_board_init() completely inside vm_start().  I don't
think that's feasible.

So I'm OK with your approach.

Now I will review the actual code in a separate e-mail.  :)

-- 
Eduardo

Reply via email to