Hi Alex, On Thu, Feb 27, 2020 at 1:19 PM Alex Bennée <[email protected]> wrote:
> > Niek Linnenbank <[email protected]> writes: > > > Hi Alex, > > > > On Wed, Feb 26, 2020 at 7:13 PM Alex Bennée <[email protected]> > wrote: > > > >> While 32mb is certainly usable a full system boot ends up flushing the > >> codegen buffer nearly 100 times. Increase the default on 64 bit hosts > >> to take advantage of all that spare memory. After this change I can > >> boot my tests system without any TB flushes. > >> > > > > That great, with this change I'm seeing a performance improvement when > > running the avocado tests for cubieboard. > > It runs about 4-5 seconds faster. My host is Ubuntu 18.04 on 64-bit. > > > > I don't know much about the internals of TCG nor how it actually uses the > > cache, > > but it seems logical to me that increasing the cache size would improve > > performance. > > > > What I'm wondering is: will this also result in TCG translating larger > > chunks in one shot, so potentially > > taking more time to do the translation? If so, could it perhaps affect > more > > latency sensitive code? > > No - the size of the translation blocks is governed by the guest code > and where it ends a basic block. In system mode we also care about > crossing guest page boundaries. > > >> Signed-off-by: Alex Bennée <[email protected]> > >> > > Tested-by: Niek Linnenbank <[email protected]> > > > > > >> --- > >> accel/tcg/translate-all.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c > >> index 4ce5d1b3931..f7baa512059 100644 > >> --- a/accel/tcg/translate-all.c > >> +++ b/accel/tcg/translate-all.c > >> @@ -929,7 +929,11 @@ static void page_lock_pair(PageDesc **ret_p1, > >> tb_page_addr_t phys1, > >> # define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1) > >> #endif > >> > >> +#if TCG_TARGET_REG_BITS == 32 > >> #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB) > >> +#else > >> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB) > >> +#endif > >> > > > > The qemu process now takes up more virtual memory, about ~2.5GiB in my > > test, which can be expected with this change. > > > > Is it very likely that the TCG cache will be filled quickly and > completely? > > I'm asking because I also use Qemu to do automated testing > > where the nodes are 64-bit but each have only 2GiB physical RAM. > > Well so this is the interesting question and as ever it depends. > > For system emulation the buffer will just slowly fill-up over time until > exhausted and which point it will flush and reset. Each time the guest > needs to flush a page and load fresh code in we will generate more > translated code. If the guest isn't under load and never uses all it's > RAM for code then in theory the pages of the mmap that are never filled > never need to be actualised by the host kernel. > > You can view the behaviour by running "info jit" from the HMP monitor in > your tests. The "TB Flush" value shows the number of times this has > happened along with other information about translation state. > Thanks for clarifying this, now it all starts to make more sense to me. Regards, Niek > > > > > Regards, > > Niek > > > > > >> > >> #define DEFAULT_CODE_GEN_BUFFER_SIZE \ > >> (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \ > >> -- > >> 2.20.1 > >> > >> > >> > > > -- > Alex Bennée > -- Niek Linnenbank
