Hi David , Thank you for your reply.
I am running the algorithm on OMAP processor (arm-core) and i did tried the same on iMX processor which takes 1.7 times more than OMAP. It is true that the algorithm is performing the vector operation which is blowing the cache . But the question is How to lock the cache ? In driver how should we implement the same ? An example code or a document could be helpful in this regard. --- Misbah <>< David Hawkins-3 wrote: > > > Hi Misbah, > > I would recommend you look at your floating-point code again > and benchmark each section. You should be able to estimate > the number of clock cycles required to complete an operation > and then check that against your measurements. > > Depending on whether your algorithm is processing intensive > or data movement intensive, you may find that the big time > waster is moving data on or off chip, or perhaps its a large > vector operation that is blowing out the cache. If you > do find that, then on some processors you can lock the > cache, so your algorithm would require a custom driver > that steals part of the cache from the OS, but the floating point > code would not run in the kernel, it would run on data > stored in the stolen cache area. You can lock both instructions > and data in the cache; eg. an FFT routine can be locked in > the instruction cache, while FFT data is in the data cache. > I'm not sure how easy this is to do under Linux though. > > Here's an example of the level of detail you can get > downto when benchmarking code: > > http://www.ovro.caltech.edu/~dwh/correlator/pdf/dsp_programming.pdf > > The FFT routine used on this processor made use of both > the instruction and data cache (on-chip SRAM) on the > DSP. > > This code is being re-developed to run on a MPC8349EA PowerPC > with FPU. I did some initial testing to confirm that the > FPU operates as per the data sheet, and will eventually get > around to more complete testing. > > Which processor were you running your code on, and what > frequency were you operating the processor at? How does > the algorithm timing compare when run on other processors, > eg. your desktop or laptop machine? > > Cheers, > Dave > _______________________________________________ > Linuxppc-embedded mailing list > [email protected] > https://ozlabs.org/mailman/listinfo/linuxppc-embedded > > -- View this message in context: http://www.nabble.com/floating-point-support-in-the-driver.-tp18772109p18827857.html Sent from the linuxppc-embedded mailing list archive at Nabble.com. _______________________________________________ Linuxppc-embedded mailing list [email protected] https://ozlabs.org/mailman/listinfo/linuxppc-embedded
