Re: [Interest] Cross platform accelerated instructions framework

Nuno Santos Thu, 14 May 2015 08:14:58 -0700

Thanks for all the insights so far. 

I confess i’m a newbie regarding performance optimisation. I’m writing a 
synthesiser. It’s computing audio on a real time basis and it’s already getting 
heavy for 8 voices of polyphony. It’s making an iPad 2 work on the limit which 
is not good, specially when I interact with user interface and it starts 
glitching.


I think there is a lot of optimisation margin specially regarding to code 
structuring but i’m not sure.

Maybe first I should be able to optimize the code for maximum performance using 
the compiler only. I have a lot of encapsulation and i’m not sure if this is 
good for optimisations. For example, the following function calculates the 
output of one of the synthesiser voices. Sorry for the long code listing, but 
maybe someone could point me basic errors i’m doing that will completely 
compromise compiler optimisations.

Of course that for vectorisation I will need to identify opportunities and 
refactor the data structure to make the vectorisation possible. But, who knows 
i’m doing terrible things that could spare me a nice bunch of important CPU 
cycles?

(this is by far the longest function in the whole program)

// typedef float IAudioSample

IAudioSample IBasicSynthVoice::step()
{
    IAudioSample output=0;
    IAudioSample filterModulation=0;
    IAudioSample pitchModulationSum=0;
    IAudioSample tmp1=0,tmp2=0;

    float eg1 = _eg[0].step();
    float eg2 = _eg[1].step();

    eg1 += eg1*_modWheelMultiplier[MODWHEEL_EG_1];
    eg2 += eg2*_modWheelMultiplier[MODWHEEL_EG_2];

    // applying pitch modulation
    switch (_pitchModulationSource[0])
    {
        case 1:
            tmp1 += _lfo1;
            tmp1 += _lfo1;
            break;
        case 2:
            tmp1 += eg1;
            tmp1 += eg1;
            tmp1 += eg1;
            tmp1 += eg1;
            break;
        case 3:
            tmp1 += _lfo1;
            tmp1 += _lfo1;
            tmp1 += eg1;
            tmp1 += eg1;
            tmp1 += eg1;
            tmp1 += eg1;
            break;
        default:
            break;
    }

    tmp1 *= _pitchModulationAmount[0];

    switch (_pitchModulationSource[1])
    {
        case 1:
            tmp2 += _lfo2;
            tmp2 += _lfo2;
            break;
        case 2:
            tmp2 += eg2;
            tmp2 += eg2;
            tmp2 += eg2;
            tmp2 += eg2;
            break;
        case 3:
            tmp2 += _lfo2;
            tmp2 += _lfo2;
            tmp2 += eg2;
            tmp2 += eg2;
            tmp2 += eg2;
            tmp2 += eg2;
            break;
        default:
            break;
    }

    tmp2 *= _pitchModulationAmount[1];

    pitchModulationSum = (tmp1+tmp2)/12.f;
    pitchModulationSum *= _noteFrequency;

    float _osc1PitchModulation = 0;
    float _osc2PitchModulation = 0;
    float _subPitchModulation = 0;

    switch (_pitchModulationDestination)
    {
        case 1:
            _osc1PitchModulation += pitchModulationSum;
            break;
        case 2:
            _osc2PitchModulation += pitchModulationSum;
            break;
        case 3:
            _osc1PitchModulation += pitchModulationSum;
            _osc2PitchModulation += pitchModulationSum;
            break;
        case 4:
            _subPitchModulation += pitchModulationSum;
            break;
        case 5:
            _osc1PitchModulation += pitchModulationSum;
            _subPitchModulation += pitchModulationSum;
            break;
        case 6:
            _osc2PitchModulation += pitchModulationSum;
            _subPitchModulation += pitchModulationSum;
            break;
        case 7:
            _osc1PitchModulation += pitchModulationSum;
            _osc2PitchModulation += pitchModulationSum;
            _subPitchModulation += pitchModulationSum;
            break;
    }

    if (_pitchBendDestination[PITCHBEND_OSC_1])
    {
        _osc1PitchModulation += _pitchBendMultiplier*_noteFrequency;
        _subPitchModulation += _pitchBendMultiplier*_noteFrequency;
    }

    if (_pitchBendDestination[PITCHBEND_OSC_2])
    {
        _osc2PitchModulation += _osc2.frequency()*_pitchBendMultiplier;
    }

    _osc1.setModulation(_osc1PitchModulation);
    _osc2.setModulation(_osc2PitchModulation);
    _sub.setModulation(_subPitchModulation);

    float sub = _sub.step();
    float osc1 = _osc1.step();
    float osc2 = _osc2.step();

    if (_osc2Sync && _osc1.sync())
        _osc2.setPhase(0);

    float ring = osc1*osc2;

    // FM
    //_osc1.setModulation(osc2*_crossModulationAmount*2500);

    // mixer
    output = (ring*_ringAmount);
    output += (osc1*_osc1Volume);
    output += (osc2*_osc2Volume);
    output += (sub*_subVolume);
    output += (_noise);

    _saturator.process(&output, &output);

    //calculateFilterModulation(eg2, osc2);
    // applying filter modulation

    // modulation amount - eg2
    filterModulation += 
eg2*_filterModulationAmount[0]*(1+_filterModulationAmount[5]*_velocity);

    // filter modulation amount - 1 - lfo1
    filterModulation += 
_lfo1*_filterModulationAmount[1]*_filterModulationAmount[1];

    // filter modulation amount - 2 - lfo2
    filterModulation += 
_lfo2*_filterModulationAmount[2]*_filterModulationAmount[2];

    // filter modulation amount - 3 - vco2
    filterModulation += osc2*_filterModulationAmount[3];

    // filter modulation amount - 4 - kbd

    if (_pitchBendDestination[PITCHBEND_FILTER])
        filterModulation += (powf(2, _pitchBendRange*_pitchBend)-1);

    _filter.setKeyboardMultiplier(_kbdFilter);
    _filter.setModulation(filterModulation);

    _filter.process(&output, &output);

    // filter modulation amount - 5 - vel

    // vca modulation - eg1, eg2, kbd
    output *= 
(eg1*_ampModulationAmount[2]+eg2*_ampModulationAmount[3])*_kbdFilter;
    output *= (1+_ampModulationAmount[5]*_velocity);

    float ampModulationSum = 0;

    // vca modulation - lfo1
    ampModulationSum += _lfo1*_ampModulationAmount[0];

    // vca modulation - lfo2
    ampModulationSum += _lfo2*_ampModulationAmount[1];

    if (ampModulationSum>1.5)
        ampModulationSum=1.5;

    if (ampModulationSum<-1.5)
        ampModulationSum=-1.5;

    output -= output*ampModulationSum;

    return output;
}

Nuno Santos

> On 14 May 2015, at 13:52, Allan Sandfeld Jensen <k...@carewolf.com> wrote:
> 
> To write in a way that the compiler can auto-vectorize, write the CPU 
> intensive work in simple inner loops without function calls (or only inlined 
> ones), use no array access by anything other than the index counter, and also 
> avoid branches as much as possible. If you do need branches, write them as 
> using conditional assign with c ? a : b.

_______________________________________________
Interest mailing list
Interest@qt-project.org
http://lists.qt-project.org/mailman/listinfo/interest

Re: [Interest] Cross platform accelerated instructions framework

Reply via email to