Hi,
I'm trying to implement my own machine for amiga emulation using a software
cpu and fpga hardware. For this I have built my own machine which consists
of a large malloced ram block and some fpga hardware mmapped elsewhere into
the memory space.
I'm using qemu to emulate a 68040 on an arm cortex a9 host in system mode.
It is working, though I'm investigating a strange performance issue.
I'm looking for advice on where to look next in debugging this from the
specialist(s) of accel/tcg/cputlb.c please.
To investigate the performance issue I tried to break it down to the
simplest possible case. I can reproduce it with a simple for loop (compiled
without optimisation).
for (int i=0;i!=0xffffff;++i)
{
if ((i&0xffff)==0)
{
}
}
Running it in user mode on the same host it takes ~0.6 seconds. In the
built-in 'virtual' m68k machine running linux it takes 1.3 seconds.
However in my machine under amigaos I'm seeing it typically taking 5 and a
half minutes! Occasionally it seems to run at the correct speed of <2
seconds, though I have yet to identify why. These are the logs of the
captured code before it goes into the main chain loop.
qemu_slow_stuck_fragment.log
<http://www.64kib.com/qemu_slow_stuck_fragment.log>
I have verified that this performance change is not due to slow fpga memory
area access, i.e. there are no accesses to that memory region during this.
I took a look in gdb while running this loop to see what is going on.
Initially I was surprised that I didn't find the code in 'OUT:', however I
guess it makes sense that it has to call into the framework for memory
access. I noticed that a lot of calls to glib are made and see
g_tree_lookup called a lot. This is caused by notdirty_write being called
'000s of times and each time going into the page_collection_lock and
tb_invalidate_phys_page_fast. I presume this is happening each time that
"i" is incremented on the stack, which clearly has a huge overhead.
Even being able to get a proper stack trace from gdb would be very helpful
to understand this. I tried to configure qemu with '--enable-debug' but
still do not get a proper stack if i attach to it. I'm not sure if this is
the case due to it running dynamically compiled code before calling into
this.
Thanks,
Mark