Today I made more precise testing with usage of --enable-profiler.
Here is the test procedure: 1. Boot Linux Kernel 5 times. 2. For each iteration wait while "JIT cycles" is stable for ~10 seconds 3. Write down the "cycles/op" Here are the results: Before clean-up: min: 731.9 max: 735.8 avg: 734.3 standard deviation: ~2 = 0.3% Avarage cycles/op = 734 +- 2 After clean-up: min: 747.2 max: 751.7 avg: 750.5 standard deviation: ~2 = 0.3% Avarage cycles/op = 750 +- 2 Slow-down of TCG code generation = 2.2% After clean-up with TCGContext *const tcg_cur_ctx: min: 730.6 max: 733.2 avg: 728.7 standard deviation: ~2 = 0.3% Avarage cycles/op = 729 +- 2 Slow-down of TCG code generation = 0% I suggest to define tcg_cur_ctx as TCGContext *const. Then we will get rid of TCG code generation slow-down and also will have no usage of global variables. On 10/25/2012 10:45 AM, Evgeny Voevodin wrote:
Here are the results of tests before and after this patch series was applied: * EEMBC CoreMark (before -> after) - Guest: Exynos4210 ARMv7, Linux (Custom buildroot image) - Host: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz, 4GB RAM, Linux - Results: 1148.105626 -> 1161.440186 (+1.16%) * nbench (before -> after) - Guest: Exynos4210 ARMv7, Linux (Custom buildroot image) - Host: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz, 4GB RAM, Linux - Results . MEMORY INDEX: 1.864 -> 1.862 (-0.11%) . INTEGER INDEX: 2.518 -> 2.523 (+0.2%) . FLOATING-POINT INDEX: 0.385 -> 0.394 (+2.34%) Those tests show that it became even faster :)) But I'm quite sceptical about such results. The thing is that in case of nbench it prints the warning if results are not 95% statistically accurate. So we can be sure that nbench result is 95% accurate. And it's obvious that result shown above are in the scope of this accuracy. I don't know the accuracy of CoreMark. So, the main decision we can make that this patch series didn't introduce any slow-down comparable to inaccuracy of the measurement. Is this enough? On 10/23/2012 10:21 AM, Evgeny Voevodin wrote:This set of patches moves global variables to tcg_ctx: gen_opc_ptr gen_opparam_ptr gen_opc_buf gen_opparam_buf Build tested for all targets. Execution tested on ARM. I didn't notice any slow-down of kernel boot after this set was applied. Changelog: v1->v2: Introduced TCGContext *tcg_cur_ctx global to use in those places where we don't have an interface to pass pointer to tcg_ctx. Code style clean-up Evgeny (2): tcg/tcg.h: Duplicate global TCG variables in TCGContext TCG: Remove unused global variables Evgeny Voevodin (5): translate-all.c: Introduce TCGContext *tcg_cur_ctx TCG: Use gen_opc_ptr from context instead of global variable. TCG: Use gen_opparam_ptr from context instead of global variable. TCG: Use gen_opc_buf from context instead of global variable. TCG: Use gen_opparam_buf from context instead of global variable. gen-icount.h | 2 +- target-alpha/translate.c | 10 +- target-arm/translate.c | 10 +- target-cris/translate.c | 13 +- target-i386/translate.c | 10 +- target-lm32/translate.c | 13 +- target-m68k/translate.c | 10 +- target-microblaze/translate.c | 13 +- target-mips/translate.c | 11 +- target-openrisc/translate.c | 13 +- target-ppc/translate.c | 11 +- target-s390x/translate.c | 11 +- target-sh4/translate.c | 10 +- target-sparc/translate.c | 10 +- target-unicore32/translate.c | 10 +- target-xtensa/translate.c | 8 +- tcg/optimize.c | 62 ++++---- tcg/tcg-op.h | 324 ++++++++++++++++++++--------------------- tcg/tcg.c | 85 ++++++----- tcg/tcg.h | 11 +- translate-all.c | 4 +- 21 files changed, 328 insertions(+), 323 deletions(-)
-- Kind regards, Evgeny Voevodin, Technical Leader, Mobile Group, Samsung Moscow Research Center, e-mail: [email protected]
