Hi Alex,

On Thu, Feb 27, 2020 at 1:19 PM Alex Bennée <[email protected]> wrote:

>
> Niek Linnenbank <[email protected]> writes:
>
> > Hi Alex,
> >
> > On Wed, Feb 26, 2020 at 7:13 PM Alex Bennée <[email protected]>
> wrote:
> >
> >> While 32mb is certainly usable a full system boot ends up flushing the
> >> codegen buffer nearly 100 times. Increase the default on 64 bit hosts
> >> to take advantage of all that spare memory. After this change I can
> >> boot my tests system without any TB flushes.
> >>
> >
> > That great, with this change I'm seeing a performance improvement when
> > running the avocado tests for cubieboard.
> > It runs about 4-5 seconds faster. My host is Ubuntu 18.04 on 64-bit.
> >
> > I don't know much about the internals of TCG nor how it actually uses the
> > cache,
> > but it seems logical to me that increasing the cache size would improve
> > performance.
> >
> > What I'm wondering is: will this also result in TCG translating larger
> > chunks in one shot, so potentially
> > taking more time to do the translation? If so, could it perhaps affect
> more
> > latency sensitive code?
>
> No - the size of the translation blocks is governed by the guest code
> and where it ends a basic block. In system mode we also care about
> crossing guest page boundaries.
>
> >> Signed-off-by: Alex Bennée <[email protected]>
> >>
> > Tested-by: Niek Linnenbank <[email protected]>
> >
> >
> >> ---
> >>  accel/tcg/translate-all.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> >> index 4ce5d1b3931..f7baa512059 100644
> >> --- a/accel/tcg/translate-all.c
> >> +++ b/accel/tcg/translate-all.c
> >> @@ -929,7 +929,11 @@ static void page_lock_pair(PageDesc **ret_p1,
> >> tb_page_addr_t phys1,
> >>  # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
> >>  #endif
> >>
> >> +#if TCG_TARGET_REG_BITS == 32
> >>  #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
> >> +#else
> >> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB)
> >> +#endif
> >>
> >
> > The qemu process now takes up more virtual memory, about ~2.5GiB in my
> > test, which can be expected with this change.
> >
> > Is it very likely that the TCG cache will be filled quickly and
> completely?
> > I'm asking because I also use Qemu to do automated testing
> > where the nodes are 64-bit but each have only 2GiB physical RAM.
>
> Well so this is the interesting question and as ever it depends.
>
> For system emulation the buffer will just slowly fill-up over time until
> exhausted and which point it will flush and reset. Each time the guest
> needs to flush a page and load fresh code in we will generate more
> translated code. If the guest isn't under load and never uses all it's
> RAM for code then in theory the pages of the mmap that are never filled
> never need to be actualised by the host kernel.
>
> You can view the behaviour by running "info jit" from the HMP monitor in
> your tests. The "TB Flush" value shows the number of times this has
> happened along with other information about translation state.
>

Thanks for clarifying this, now it all starts to make more sense to me.

Regards,
Niek


>
> >
> > Regards,
> > Niek
> >
> >
> >>
> >>  #define DEFAULT_CODE_GEN_BUFFER_SIZE \
> >>    (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
> >> --
> >> 2.20.1
> >>
> >>
> >>
>
>
> --
> Alex Bennée
>


-- 
Niek Linnenbank

Reply via email to