On 10/26/2021 10:06 AM, Chuck Zmudzinski wrote:
On 10/25/2021 4:45 PM, Chuck Zmudzinski wrote:
On 10/23/2021 11:11 AM, Hans van Kranenburg wrote:
Hi!
On 5/10/2021 1:33 PM, Chuck Zmudzinski wrote:
[...] with buster and bullseye running as the Dom0, I can only get
the VGA/Passthrough feature to work with Windows Xen HVMs. I would
expect both Windows and Linux HVMs to work comparably well.
A possible time-saver that I can recommend is to send a post to the
upstream xen-users list [0] about this already. Like "Hi all, I'm
starting a HVM Linux domU with Linux 5.10.70 on a Xen 4.14.3 system
with
also 5.10.70 dom0 kernel, with this and this domU config file. It fails
to start, this is the xl -vvv create output, and this error (the irq
stuff) appears in the dom0 kernel log.". Try to keep it simple and not
too long initially, without the surrounding stories, to increase chance
of it being fully read.
I can do this soon - I have some more interesting tests to share
here and with the Xen developers upstream.
I will need to think a little about how to present this bug to
the Xen upstream developers in a short and simple enough way
for them to be likely to read it initially. For now, I will report here
some results from the journal log entries of both Bullseye dom0
and Bullseye domU for two different configurations. These logs
are not generated with the -vvv option, but they do provide
quite a bit of interesting information and are already
somewhat overwhelming, even without the -vvv option. So
I will hold off for now before making the logs even more verbose
with -vvv.
The intention of this message is to provide detailed logs for a
detailed analysis of the problem, not to describe the problem
in simple terms.
A few days ago I ran two tests, and I have four different log
files attached from those tests. In both tests, the Bullseye
HVM was configured for PCI/IGD passthrough using the
domain config file and preparation for passthrough in dom0
described in the earlier message #31:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988333#31
The two tests were:
1. Bullseye dom0, Debian 11.1 / Bullseye HVM domU, Debian 11.1
This first test essentially confirmed that the updated versions
of the packages for both Bullseye dom0 and Bullseye domU
since the original report five months ago do not fix the
problem. In this test case, I am using all the official packages
of Debian 11.1 (Bullseye).
It is important to note that the version of the device
model used in this test is the official upstream version
of qemu for Bullseye. On Debian, Xen uses by default the
qemu-system-i386 binary from the qemu-system-x86
package, and Bullseye currently uses qemu version
5.2+dfsg-11+deb11u1 as the default device model.
I attached two log files from this test:
qemu-upstream-hvm.txt and qemu-upstream-dom0.txt.
They are the logged journal entries for the Bullseye HVM
and Bullseye dom0 domains, respectively. They are fairly
complete logs, showing the kernel version running in both
the dom0 and the HVM, the kernel command line for both
the dom0 and the domU, the command that was used to
create the HVM domain, etc.
One might recall that in the original report I said it was
difficult to capture logs from the domU, but this time I was able
to capture the log by waiting a few minutes before shutting
it down. I also discovered, in contrast to what I said in the
earlier report, that it is possible to gracefully shut down the
domU using xl shutdown <dom id> by waiting long enough
before trying to shut it down, and also it takes a few minutes
instead of the normal few seconds to shut it down because
of the problems caused by this configuration. By waiting
for the graceful shutdown instead of using xl destroy <dom id>,
I was able to view the log of the attempted boot in the domU
on a subsequent normal boot (without PCI passthrough) using
journalctl, and capture some useful Call Traces.
For this first test, although there is a successful shut down,
the domain is never built to the point where one can login,
neither at the terminal nor remotely via ssh. But the boot
messages were displayed on the passed through video
device, but only very slowly, it took almost two minutes
before the boot messages started to appear and it also
took a couple of minutes after issuing the xl shutdown
command in dom0 before it indicated on the passed
through video device that the HVM domain shut down
and powered off.
The second test:
2. Same as first test, except use the qemu traditional device
model instead of the qemu upstream model which on Debian
comes from the qemu-system-x86 package.
I also attached two log files from this test:
qemu-traditional-hvm.txt and qemu-traditional-dom0.txt,
and these also are fairly complete logs showing the kernel
version in use, etc.
Since Debian does not provide the traditional device model,
I had to build it from xenbits.xen.org:
https://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=shortlog;h=refs/heads/stable-4.14
I also had to build a modified hvmloader with rombios support
as required by the traditional qemu device model, and that can
be done fairly easily with a slight modification to the build of the
xen-utils-4.14 binary package for amd64.
This device model and rombios is invoked by uncommenting the
device_model_version = 'qemu-xen-traditional' line in the domain
configuration file after installing the updated hvmloader file with
rombios support and the qemu-dm binary under
/usr/lib/xen-4.14/boot/hvmloader and
/usr/lib/xen-4.14/bin/qemu-dm respectively.
I accomplished this by creating a binary Debian package
called xen-qemu-tradtional-4.14 which installs these two files
and diverts the official hvmloader binary in xen-utils-4.14 to
hvmloader.norombios.
I verified my build of qemu-xen-traditional is correct
enough to successfully pass through the PCI devices,
including the Intel IGD, to a Windows 10 HVM using the
traditional qemu device model and a Bullseye Xen dom0.
In this configuration, the Bullseye HVM booted quickly and
I was able to login remotely to it via ssh. This result shows
the crash is not nearly as catastrophic as for the
case when the upstream qemu device model is used.
But there is no output on the display and there is still
a crash and call trace with this test, but *only* in the domU.
In this test, it was the i915 kernel module that crashed
in the domU, and there is some useful information in
the attached qemu-traditional-hvm.txt log file that should
help diagnose the problem. This is in contrast to the first
test with the upstream device model where a call trace of a
crash appears in the journal of *both* the domU and the dom0.
Another key difference between the two tests is that in the
first test with the upstream qemu device model, the crash
indicates a failure by the Xen hypervisor and/or Linux kernel
to handle an IRQ instead of a failure in the i915 kernel
module that occurs in the second test with the traditional
qemu device model.
It is not surprising that the behavior of the HVM domU depends
not only on the hypervisor version but also on the qemu
device model version because the virtual firmware seen by
the domU depends on both the hypervisor and the device
model running in dom0 to support the HVM domU, and also
on the different bios versions used: rombios for the traditional
device model and seabios for the upstream device model. So
many different components makes it take a while to narrow
down the problem.
The logs contain some explanatory comments and are redacted
to try to remove private data such as mac addresses and
IP addresses.
All the best,
Chuck
Two more clarifications that might be needed to know how to
repeat the two tests described in the previous message:
1) To use the traditional device model, it is also necessary to
comment out the device_model_version = 'qemu-xen' line in
the domain configuration file in addition to uncommenting the
device_model_version = 'qemu-xen-traditional' line.
2). In both tests, the command line options for the hypervisor,
Debian version 4.14.3-1~deb11u1, was:
Command line: placeholder dom0_mem=3G,max:3G smt=false pv-l1tf=false iommu=1
My system uses UEFI booting of grub without secure boot, and
it boots the xen-4.14-amd64.gz hypervisor file from the xen
hypervisor package.
All the best,
Chuck