Hi all, I have updated my blog to reflect my understanding and attempts for cache performance issue.
Lately I have been trying around memory attributes for the mm_config_table. One set of configurations for cacheable memory (inner and outer levels)ended up reducing performance further ( which I really thought would improve). So this table set up certainly controls performance. The results are not improving after turning on cache. So memory sections are perhaps not even getting cached. I get a feeling it has got to do with this mm_config_table. Updates from the github code and blog might help in further discussion. Link to github code:https://github.com/krohini1593/rtems/tree/rohini Link to Blog <http://rohiniwithrpi2.blogspot.in/p/blog-page_3.html> Thanks! On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore <alan.cudm...@gmail.com> wrote: > Hi, > Some of the code examples may give you some clues. Like this one: > https://github.com/mrvn/test/blob/master/smp.cc > > Or this: > https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT > > If you still can't figure it out, you can always join the raspberrypi.org > forums and ask on this thread: > https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904 > > When it comes to the Pi 2 and SMP, you are our RTEMS expert :) > > Thanks, > Alan > > > On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni <krohini1...@gmail.com> > wrote: > >> Hi, >> >> This is regarding Pi 2 SMP support. After powering on, the secondary >> mailboxes read one of their four mailbox registers and wait for a non-zero >> content to be written. This content is to be the physical address of the >> location from where the cores are expected to start execution. >> >> I am stuck at figuring out this address. How should I go about >> understanding this? >> >> Thanks! >> On 3 Jun 2015 19:44, "Gedare Bloom" <ged...@gwu.edu> wrote: >> >>> On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni <krohini1...@gmail.com> >>> wrote: >>> > But, I can't say cache configurations have a role here. >>> > >>> > I'll push my code to my github project soon. >>> > >>> > P.S. The Pi2 board I possess seems to have broken down. It just isn't >>> > turning on. Unable to test further. Will order one immediately. >>> > >>> Ouch. Make sure you put it in a safe space for development, clear of >>> threats like moisture, static shock, and cats. >>> >>> > On 3 Jun 2015 09:03, "Rohini Kulkarni" <krohini1...@gmail.com> wrote: >>> >> >>> >> Hi, >>> >> >>> >> Alan, your suggestion has resulted in much improvement >>> >> >>> >> arm_control=0x1000 >>> >> >>> >> This has simply worked! Looks like the other cores were taking up >>> plenty >>> >> of time. >>> >> I was aware from references that the other cores run a WFI, but ya, >>> did >>> >> not get its impact. >>> >> Time for each dhrystone has reduced to 7 from 13 and the no of >>> dhrystones >>> >> per second also increased. >>> >> >>> >> But this is a change only in the config.txt not actually in the boot >>> code. >>> >> >>> >> Thanks >>> >> >>> >> Rohini >>> >> >>> >> >>> >> >>> >> On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore <alan.cudm...@gmail.com> >>> >> wrote: >>> >>> >>> >>> The caches are being enabled on the RPI 1 BSP. The same code is being >>> >>> executed by the RPI 2 BSP, but obviously it’s not sufficient for the >>> cache >>> >>> setup. >>> >>> I have been reading through this long thread, and it is very >>> informative: >>> >>> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904 >>> >>> >>> >>> I am starting to understand the setup that is required to enable >>> caches >>> >>> on the RPI 2. For example this message near the bottom of page 3 >>> gives a >>> >>> good indication of the speedup available by configuring the MMU and >>> caches >>> >>> correctly: >>> >>> Quote from above thread >>> >>> ------------------------------ >>> >>> Enabling I/D caches and branch prediction, just like the julia demo >>> uses, >>> >>> it takes ~12 seconds, or ~21 fps. It's just one core but also a much >>> smaller >>> >>> loop than the julia demo has. >>> >>> >>> >>> Enabling the MMU and mapping memory inner/outer write-back, write >>> >>> allocate and the framebuffer inner write-through, no write allocate >>> + outer >>> >>> write-back, write-allocate it takes ~8 seconds, of 32 fps. >>> >>> >>> >>> PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2 >>> cache >>> >>> effect. >>> >>> ------------------------- >>> >>> End of quote >>> >>> >>> >>> The person who posted the above comment (mrvn) posted the code here: >>> >>> https://github.com/mrvn/test/blob/master/mmu.cc >>> >>> >>> >>> >>> >>> Also, it seems that when the Pi 2 starts up, cores 1-3 are put in a >>> wait >>> >>> loop always accessing the bus. By putting this option in the >>> config.txt file >>> >>> you can put the other cores to sleep, speeding up the code on core 1. >>> >>> arm_control=0x1000 >>> >>> It would be worth trying that option to see if the benchmark speeds >>> up. >>> >>> >>> >>> >>> >>> Alan >>> >>> >>> >>> On Jun 2, 2015, at 8:05 AM, Hesham ALMatary < >>> heshamelmat...@gmail.com> >>> >>> wrote: >>> >>> >>> >>> On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni < >>> krohini1...@gmail.com> >>> >>> wrote: >>> >>> >>> >>> From what I saw, they have to be enabled separately. Cache/mmu are >>> >>> disabled >>> >>> upon reset. >>> >>> >>> >>> For the existing Raspberry BSP [1] there's a code for MMU/Cache init, >>> >>> however I don't know about Pi2 and where its code is. >>> >>> >>> >>> [1] >>> >>> >>> https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi >>> >>> >>> >>> On 2 Jun 2015 16:59, "Hesham ALMatary" <heshamelmat...@gmail.com> >>> wrote: >>> >>> >>> >>> >>> >>> Hi, >>> >>> >>> >>> Aren't the MMU/Caches enabled by default for RPi [1]? >>> >>> >>> >>> [1] >>> >>> >>> >>> >>> https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c >>> >>> >>> >>> On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill >>> >>> <joel.sherr...@oarcorp.com> wrote: >>> >>> >>> >>> >>> >>> >>> >>> On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni < >>> krohini1...@gmail.com> >>> >>> wrote: >>> >>> >>> >>> Dr. Joel, >>> >>> >>> >>> So we can't say something solely on the basis of this result? >>> >>> >>> >>> >>> >>> I don't think so. If Linux performs the same, then what you did is as >>> >>> good as it gets. >>> >>> >>> >>> However, if Linux is faster then some setting still isn't right. >>> >>> >>> >>> You need a reference measurement to have any confidence. It is >>> possible >>> >>> you did something but didn't actually turn the cache (or all the >>> cache) >>> >>> on. >>> >>> >>> >>> On 2 Jun 2015 16:28, "Rohini Kulkarni" <krohini1...@gmail.com> >>> wrote: >>> >>> >>> >>> I have not run it under linux on pi2 yet. Will have to run and check >>> >>> the result. >>> >>> >>> >>> On 2 Jun 2015 16:16, "Joel Sherrill" <joel.sherr...@oarcorp.com> >>> wrote: >>> >>> >>> >>> >>> >>> >>> >>> On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni < >>> krohini1...@gmail.com> >>> >>> wrote: >>> >>> >>> >>> HI, >>> >>> >>> >>> I tried running the dhrystone benchmark with some changes for >>> >>> >>> >>> cache/mmu >>> >>> >>> >>> set up. >>> >>> >>> >>> However, the output shows a reduction in performance. >>> >>> The time to run through the dhrystone has increased from 12 to 13 and >>> >>> dhrystones run per second decreased. >>> >>> >>> >>> According to this result, things were better with caches disabled. >>> >>> >>> >>> >>> >>> I have been working on this since two days and could not figure out >>> an >>> >>> improvement. Any pointers? >>> >>> >>> >>> >>> >>> How did it do under Linux on the Pi2? >>> >>> >>> >>> >>> >>> Thanks. >>> >>> >>> >>> >>> >>> >>> >>> On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni >>> >>> <krohini1...@gmail.com> wrote: >>> >>> >>> >>> Hi All, >>> >>> >>> >>> I have to implement the cache coherency support for Cortex A7. But >>> for >>> >>> A7 MPCore, unlike for A9, I am not able to find any register >>> >>> description for the Snoop Control Unit from the TRM. >>> >>> >>> >>> I need help here on how to proceed. >>> >>> >>> >>> Additionally for A9 there is a single bit for A9 in the Auxiliary >>> >>> Control Register which enables cache broadcast operations. The >>> >>> >>> >>> register >>> >>> >>> >>> format is different for A7 and again I am unable to find how to >>> >>> >>> >>> achieve >>> >>> >>> >>> the same for A7. >>> >>> >>> >>> Thanks! >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill >>> >>> <joel.sherr...@oarcorp.com> wrote: >>> >>> >>> >>> >>> >>> >>> >>> On 5/5/2015 11:11 AM, Rohini Kulkarni wrote: >>> >>> >>> >>> Hi, >>> >>> >>> >>> I am working with the code for bsp hooks. I am referring to existing >>> >>> ARM multicore bsp codes, zync mainly. >>> >>> >>> >>> 1. There are existing hooks for the raspberry pi. Where should the >>> >>> >>> >>> code >>> >>> >>> >>> for the Pi2 hooks be added? >>> >>> >>> >>> The Pi and Pi2 are remarkably similar so Pi2 should be placed inside >>> >>> the Pi BSP directory. >>> >>> There is already a Pi2 variant of that code built. But we know >>> >>> >>> >>> specific >>> >>> >>> >>> places where there >>> >>> are variances. Depending on the scope of what is different, it can be >>> >>> as simple as >>> >>> a cpp conditional in a .h to select a value or two implementations of >>> >>> >>> >>> a >>> >>> >>> >>> single method >>> >>> and the Makefile.am picking the right file to build based on the >>> board >>> >>> variant. >>> >>> >>> >>> The big question to always ask is: Is this specific to the Pi2 and >>> >>> incompatible with the Pi? >>> >>> >>> >>> Since the Pi BSP is still missing capabilities, it is likely code >>> >>> common to both will >>> >>> be added this summer. For example, did the mailbox interface change? >>> I >>> >>> don't know >>> >>> but would guess that it didn't. Each new capability added needs that >>> >>> added. >>> >>> >>> >>> And any differences need to be analyzed to pick the least intrusive >>> >>> >>> >>> way >>> >>> >>> >>> to provide >>> >>> alternate implementations. Or enable special code like the Pi2 SMP >>> >>> support which >>> >>> is dependent on --enable-smp and being a Pi2. >>> >>> >>> >>> 2. Am I right in understanding that I will have to implement A7 >>> >>> specific functions as have been for A9? I am referring specifically >>> to >>> >>> the arm-a9mpcore-start.h >>> >>> >>> >>> Yes. >>> >>> >>> >>> If the code is very similar between the a7 and a9, then a discussion >>> >>> on devel@ should occur to decide the best way to minimize >>> duplication. >>> >>> >>> >>> If you end up with a7 specific code, you should follow the location >>> >>> >>> >>> and >>> >>> >>> >>> >>> >>> naming patterns already established. That places it in >>> >>> libbsp/arm/shared/... >>> >>> so it can be used by any BSP with the right SMP core. >>> >>> >>> >>> >>> >>> I am referring to existing codes to locate and get hold of what needs >>> >>> to be done in the hooks. However, being new to such implementations, >>> I >>> >>> am taking longer to understand the details. Any suggestions that >>> might >>> >>> help here are welcome >>> >>> >>> >>> The answer will depend on the factors listed above. When code can >>> >>> be shared, we want to share it across as many BSPs as makes sense. >>> >>> When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2), >>> then >>> >>> you want to find the way to account for the variation in the least >>> >>> intrusive code way possible. >>> >>> >>> >>> Thanks! >>> >>> >>> >>> On 1 May 2015 12:45, "Rohini Kulkarni" <krohini1...@gmail.com> >>> wrote: >>> >>> >>> >>> >>> >>> Hi, >>> >>> >>> >>> Excited to be a part of this edition of GSoC! Thanks to everybody >>> for >>> >>> helping me get here and congratulations to all the participating >>> >>> students! >>> >>> >>> >>> So, now getting to work, firstly I wish to know, specifically from my >>> >>> mentors, any changes that must be made to my proposed project or >>> >>> schedule. >>> >>> >>> >>> Secondly, are there any specifics for the development blog that we >>> >>> >>> >>> need >>> >>> >>> >>> to create for the project? Over time what is the blog expected to >>> >>> convey. >>> >>> >>> >>> Also, I have to create a new wiki page for my project as none exists. >>> >>> >>> >>> I >>> >>> >>> >>> want to know how to add one. >>> >>> >>> >>> -- >>> >>> >>> >>> Rohini Kulkarni >>> >>> >>> >>> >>> >>> -- Joel Sherrill, Ph.D. Director of Research & Development >>> >>> joel.sherr...@oarcorp.com On-Line Applications Research Ask me about >>> >>> RTEMS: a free RTOS Huntsville AL 35805 Support Available (256) >>> >>> >>> >>> 722-9985 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> >>> >>> Rohini Kulkarni >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> >>> >>> Rohini Kulkarni >>> >>> >>> >>> >>> >>> --joel >>> >>> >>> >>> >>> >>> --joel >>> >>> _______________________________________________ >>> >>> devel mailing list >>> >>> devel@rtems.org >>> >>> http://lists.rtems.org/mailman/listinfo/devel >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Hesham >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Hesham >>> >>> _______________________________________________ >>> >>> devel mailing list >>> >>> devel@rtems.org >>> >>> http://lists.rtems.org/mailman/listinfo/devel >>> >>> >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Rohini Kulkarni >>> > >>> > >>> > _______________________________________________ >>> > devel mailing list >>> > devel@rtems.org >>> > http://lists.rtems.org/mailman/listinfo/devel >>> >> >> _______________________________________________ >> devel mailing list >> devel@rtems.org >> http://lists.rtems.org/mailman/listinfo/devel >> > > -- Rohini Kulkarni
_______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel