* Goal
  Goal of this work is to look for thumb2 code size improvements on FSF
GCC trunk.

* Methodology
  ** Build FSF GCC trunk w/ and wo/ hardfp, run benchmarks including
eembc, spec2000, and dhrystone, and check asm code to see if there is
any possible improvements on size.
  ** Get input and suggestion from ARM experts.
  ** Search open PRs in GCC bugzilla.

* Results
Each item has been tracked on launchpad, and is listed with some elements,
 ** Cause: cause of this problem is known or unknown
 ** Difficulty: estimation of implementation difficulty
 ** Recommendation: Yao's recommendation on that bug for next step

  1. LP:633233 Push/pop low register rather than high register when
keeping stack alignment
  As Richard E. pointed out, it was implemented in gcc-4.5 on 2009, but
Yao still can see the usage of r8 on FSF GCC trunk.
  Cause: Might be a regression if problem disappears on gcc-4.5.
  Difficulty: Easy.  might not hard to fix a regression.
  Recommendations: Fix this regression if it is.

  2. LP:633243 Improve regrename to make use of low registers.
  Get input from Bernd S. and Julian B.  Initial implementation has been
 suggested by Bernd S.
  Cause: current regrename in gcc treats high and low registers equally.
  Difficulty: Medium.
  Recommendation: Implement it as Bernd suggested, and do benchmarking
to see how much size is improved.

  3. LP:634682 Redundant uxth/sxth insn are generated
  Cause: Unknown
  Difficulty: Unknown
  Recommendation: No recommendation so far.

  4. LP:634696 Function is not inlined properly with -Os
  In consumer/cjpeg/jmemmgr.c, GCC inlined out_of_memory() with -Os, so
increase code size.
  Cause: Unknown.
  Difficulty: Unknown
  Recommendation: Educate GCC to inline carefully when -Os is turned on.

  5. GCC PR40730 LP:634731 Redundant memory load

  6. LP:634738 inefficient code to extract least bits from an integer value
  GCC PR40697 is for thumb-1.  The same problem is in thumb-2.
  Cause: Unknown.
  Difficulty: Medium.
  Recommendation: Fix it the similar way as fixing GCC PR40697.

  7. LP:634891 Replace load/store by memcpy more aggressively
  Difficulty: Should be easy.
  Recommendation: Fix to this problem might be "reduce threshold value
once -Os is turned on".

  8. LP:637220 allocate local variables with fewer instructions
  GCC PR40657 is about this kind of problem, and was fixed.  The similar
prolbme exits on gcc with hardfp.
  Cause: Unknown.
  Difficulty: Unknown.
  Recommendation: No recommendation so far.

  9. GCC PR 43721 Failure to optimize (a/b) and (a%b) into single
__aeabi_idivmod call
  Difficulty: Medium or easy.
  Recommendation: No.

  10. LP:637814 Combine add/move to add
  LP:637882 Combine ldr/mov to ldr
  Possible improvements have been found.  No idea how to fix it yet.
  Cause: Unknown.
  Difficulty: Unknown.
  Recommendation: No.

  11. LP:638014 Replace memset by memclr when 2nd parameter is zero
  Difficulty: Easy.
  Recommendation: No recommendation so far.

  12. LP:625233 Merge constant pools for small functions
  Cause: Unknown.
  Difficulty: Medium.
  Recommendation: No.

  13. LP:638935 Replace multiple vldr by vldm
  Some vldr insns accessing consecutive address can be replaced by
single vldm.  It is not about thumb2, but related to code size optimization.
  Cause: Unknown.
  Difficulty: Medium.
  Recommendation: No.

-- 
Yao Qi
CodeSourcery
y...@codesourcery.com
(650) 331-3385 x739

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to