Thanks for all the insights so far. I confess i’m a newbie regarding performance optimisation. I’m writing a synthesiser. It’s computing audio on a real time basis and it’s already getting heavy for 8 voices of polyphony. It’s making an iPad 2 work on the limit which is not good, specially when I interact with user interface and it starts glitching.
I think there is a lot of optimisation margin specially regarding to code structuring but i’m not sure. Maybe first I should be able to optimize the code for maximum performance using the compiler only. I have a lot of encapsulation and i’m not sure if this is good for optimisations. For example, the following function calculates the output of one of the synthesiser voices. Sorry for the long code listing, but maybe someone could point me basic errors i’m doing that will completely compromise compiler optimisations. Of course that for vectorisation I will need to identify opportunities and refactor the data structure to make the vectorisation possible. But, who knows i’m doing terrible things that could spare me a nice bunch of important CPU cycles? (this is by far the longest function in the whole program) // typedef float IAudioSample IAudioSample IBasicSynthVoice::step() { IAudioSample output=0; IAudioSample filterModulation=0; IAudioSample pitchModulationSum=0; IAudioSample tmp1=0,tmp2=0; float eg1 = _eg[0].step(); float eg2 = _eg[1].step(); eg1 += eg1*_modWheelMultiplier[MODWHEEL_EG_1]; eg2 += eg2*_modWheelMultiplier[MODWHEEL_EG_2]; // applying pitch modulation switch (_pitchModulationSource[0]) { case 1: tmp1 += _lfo1; tmp1 += _lfo1; break; case 2: tmp1 += eg1; tmp1 += eg1; tmp1 += eg1; tmp1 += eg1; break; case 3: tmp1 += _lfo1; tmp1 += _lfo1; tmp1 += eg1; tmp1 += eg1; tmp1 += eg1; tmp1 += eg1; break; default: break; } tmp1 *= _pitchModulationAmount[0]; switch (_pitchModulationSource[1]) { case 1: tmp2 += _lfo2; tmp2 += _lfo2; break; case 2: tmp2 += eg2; tmp2 += eg2; tmp2 += eg2; tmp2 += eg2; break; case 3: tmp2 += _lfo2; tmp2 += _lfo2; tmp2 += eg2; tmp2 += eg2; tmp2 += eg2; tmp2 += eg2; break; default: break; } tmp2 *= _pitchModulationAmount[1]; pitchModulationSum = (tmp1+tmp2)/12.f; pitchModulationSum *= _noteFrequency; float _osc1PitchModulation = 0; float _osc2PitchModulation = 0; float _subPitchModulation = 0; switch (_pitchModulationDestination) { case 1: _osc1PitchModulation += pitchModulationSum; break; case 2: _osc2PitchModulation += pitchModulationSum; break; case 3: _osc1PitchModulation += pitchModulationSum; _osc2PitchModulation += pitchModulationSum; break; case 4: _subPitchModulation += pitchModulationSum; break; case 5: _osc1PitchModulation += pitchModulationSum; _subPitchModulation += pitchModulationSum; break; case 6: _osc2PitchModulation += pitchModulationSum; _subPitchModulation += pitchModulationSum; break; case 7: _osc1PitchModulation += pitchModulationSum; _osc2PitchModulation += pitchModulationSum; _subPitchModulation += pitchModulationSum; break; } if (_pitchBendDestination[PITCHBEND_OSC_1]) { _osc1PitchModulation += _pitchBendMultiplier*_noteFrequency; _subPitchModulation += _pitchBendMultiplier*_noteFrequency; } if (_pitchBendDestination[PITCHBEND_OSC_2]) { _osc2PitchModulation += _osc2.frequency()*_pitchBendMultiplier; } _osc1.setModulation(_osc1PitchModulation); _osc2.setModulation(_osc2PitchModulation); _sub.setModulation(_subPitchModulation); float sub = _sub.step(); float osc1 = _osc1.step(); float osc2 = _osc2.step(); if (_osc2Sync && _osc1.sync()) _osc2.setPhase(0); float ring = osc1*osc2; // FM //_osc1.setModulation(osc2*_crossModulationAmount*2500); // mixer output = (ring*_ringAmount); output += (osc1*_osc1Volume); output += (osc2*_osc2Volume); output += (sub*_subVolume); output += (_noise); _saturator.process(&output, &output); //calculateFilterModulation(eg2, osc2); // applying filter modulation // modulation amount - eg2 filterModulation += eg2*_filterModulationAmount[0]*(1+_filterModulationAmount[5]*_velocity); // filter modulation amount - 1 - lfo1 filterModulation += _lfo1*_filterModulationAmount[1]*_filterModulationAmount[1]; // filter modulation amount - 2 - lfo2 filterModulation += _lfo2*_filterModulationAmount[2]*_filterModulationAmount[2]; // filter modulation amount - 3 - vco2 filterModulation += osc2*_filterModulationAmount[3]; // filter modulation amount - 4 - kbd if (_pitchBendDestination[PITCHBEND_FILTER]) filterModulation += (powf(2, _pitchBendRange*_pitchBend)-1); _filter.setKeyboardMultiplier(_kbdFilter); _filter.setModulation(filterModulation); _filter.process(&output, &output); // filter modulation amount - 5 - vel // vca modulation - eg1, eg2, kbd output *= (eg1*_ampModulationAmount[2]+eg2*_ampModulationAmount[3])*_kbdFilter; output *= (1+_ampModulationAmount[5]*_velocity); float ampModulationSum = 0; // vca modulation - lfo1 ampModulationSum += _lfo1*_ampModulationAmount[0]; // vca modulation - lfo2 ampModulationSum += _lfo2*_ampModulationAmount[1]; if (ampModulationSum>1.5) ampModulationSum=1.5; if (ampModulationSum<-1.5) ampModulationSum=-1.5; output -= output*ampModulationSum; return output; } Nuno Santos > On 14 May 2015, at 13:52, Allan Sandfeld Jensen <k...@carewolf.com> wrote: > > To write in a way that the compiler can auto-vectorize, write the CPU > intensive work in simple inner loops without function calls (or only inlined > ones), use no array access by anything other than the index counter, and also > avoid branches as much as possible. If you do need branches, write them as > using conditional assign with c ? a : b.
_______________________________________________ Interest mailing list Interest@qt-project.org http://lists.qt-project.org/mailman/listinfo/interest