Hi all,

I am writing some DSP software in Forth using gforth.

https://github.com/akjmicro/fsyn

I will be working on a fork specifically designed to squeeze more
performance from the same basic code, but targeting the needs of the RPi3,
which is noticeably slower in its floating-point performance than my 2013
MacBook pro (running Linux Mint).

I want to ask some more seasoned Forth experts how I might best approach
the problem. Some things I have considered:

1) going to fixed-point math

2) using fast approximated trig functions (relates to #1, although in
theory trig approximations offer speedups whether they are done in fixed or
floating point, even though it is generally known that fixed-point trig
should be faster)

3) lookup tables (this seems to be slower than promised, but maybe that is
an issue with the gforth intrepreter slowing it down, or the lookups
themselves are slow on the ARM architecture in question?)

4) using gforth as "glue", and dropping to C or Assembler for relief from
performance bottlenecks. (Even though interpreted Forth is second only to C
for speed)

5) Optimising for the ARM's VFP (Neon) architecture, which I believe would
have to be done via one of these ways (or a combination):
  a) compiling gforth with certain GCC flags for optimising against ARM
  b) making direct assembly calls to Neon instructions (is this possible in
gforth currently?)
  c) somehow linking in C object code that is floating-point optimized.

Does anyone have any thoughts or wisdom in this area to share?
Particularly, I'm curious about those with experience optimizing gforth
performance for the RPi.

Best,

AKJ

Reply via email to