Adapteva's CEO, Andreas Oloffson, gave a talk Friday at ORNL, which was very well attended. He gave an interesting talk about how to program a 16,000 core chip, which was more about the architecture and design choices than actually programming a 16K core chip. It is most impressive given that it was a team of three over a period of three months.
The cores are simple, dual issue RISC with 32 KB of scratch pad and a network router. There is no cache or coherency protocol. Every core can read/write every other core's memory so that it can appear as a distributed, shared memory machine. Non-local accesses are automatically converted to network calls and sent out over the NoC. Nearest neighbor latency is 4 ns for writes and 16 ns for reads. Farthest neighbor writes are 16 ns and 30 ns reads. Routing is east/west then north/south. The cores form a 2D mesh. He claims that they can build a 1,024 core chip today if there is demand for it. The initial markets are telecom, military, and medical and the applications best suited for it would need a DSP. For HPC, they claim 102 GF/s at 2 watts (51 GF/watt), which is exascale class almost (i.e. 1 EF/s at 20 MW ignoring cooling, networks, etc). It only has single-precision floating point currently. They can add double-precision given enough demand. Depending on the memory per core configured, it could provide a double-precision peak performance about 30-40% less than the current board. They support C/C++ and OpenCL. Actually, the latter is converted to C++ and C++ is limited given the limited amount of memory. That said, if the bulk of your program can fit under 1,500 lines of C, he asserts that it will scream. Lastly, once all the kickstarter boards go out, they hope to have them available on Amazon for immediate delivery. Scott On Fri, May 23, 2014 at 9:32 AM, Eugen Leitl <eu...@leitl.org> wrote: > > After I've finally gotten my Kickstart backer board and set it > up to boot (you will need the included heatsink on the Zynq 7020 > as well as a small fan) I've ran a few included benchmarks. > > In no particular order of relevance: > > linaro-nano:~/Parallella/epiphany-examples/mesh_bandwidth_all2one> ./run.sh > 0x0000417e! > The bandwidth of all-to-one is 4193.00MB/s! > > > linaro-nano:~/Parallella/epiphany-examples/mesh_bandwidth_bisection> > ./run.sh > 0x00000f46! > The bandwidth of bisection is 9590.00MB/s! > > linaro-nano:~/Parallella/epiphany-examples/basic_math> ./run.sh > > The clock cycle count for addition is 5. > > The clock cycle count for subtraction is 5. > > The clock cycle count for multiplication is 6. > > The clock cycle count for division is 47. > > The clock cycle count for "fmodf()" is 66635. > > The clock cycle count for "sinf()" is 23930. > > The clock cycle count for "cosf()" is 51115. > > The clock cycle count for "sqrtf()" is 93785. > > The clock cycle count for "ceilf()" is 18475. > > The clock cycle count for "floorf()" is 17690. > > The clock cycle count for "log10f()" is 10735. > > The clock cycle count for "logf()" is 9976. > > The clock cycle count for "powf()" is 348243. > > The clock cycle count for "ldexpf()" is 36306. > > linaro-nano:~/Parallella/epiphany-examples/matmul-16> ./run.sh > > Matrix: C[512][512] = A[512][512] * B[512][512] > > Using 4 x 4 cores > > Seed = 0.000000 > Loading program on Epiphany chip... > Writing C[1048576B] to address 00200000... > Writing A[1048576B] to address 00000000... > Writing B[1048576B] to address 00100000... > GO Epiphany! ... Writing the GO!... > Done... > Finished calculating Epiphany result. > Reading result from address 00200000... > Calculating result on Host ... Finished calculating Host result. > Reading time from address 00300008... > > *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** > Verifying result correctness ... C_epiphany == C_host > *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** > > Epiphany - time: 153.0 msec (@ 600 MHz) > Host - time: 1867.2 msec (@ 667 MHz) > > * * * EPIPHANY FTW !!! * * * > > I can run the rest of the examples and post numbers if there's > interest: > > naro-nano:~/Parallella/epiphany-examples> ls -la > total 152 > drwxrwxr-x 36 linaro linaro 4096 May 22 15:46 ./ > drwxrwxr-x 5 linaro linaro 4096 Mar 7 12:09 ../ > drwxrwxr-x 8 linaro linaro 4096 Mar 6 23:47 .git/ > -rw-rw-r-- 1 linaro linaro 227 Mar 6 23:42 .gitignore > -rw-rw-r-- 1 linaro linaro 1464 Mar 6 23:42 README.md > drwxrwxr-x 4 linaro linaro 4096 May 17 11:47 assembly/ > drwxrwxr-x 4 linaro linaro 4096 Mar 6 23:44 basic_math/ > drwxrwxr-x 4 linaro linaro 4096 Mar 6 23:47 clockgating_mode/ > drwxrwxr-x 4 linaro linaro 4096 May 17 11:48 ctimer/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_2d/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_chain/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_interrupt/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_message_read/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_message_write/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_slave/ > drwxrwxr-x 4 linaro linaro 4096 May 22 15:48 e-dump-mem/ > drwxrwxr-x 4 linaro linaro 4096 May 22 15:46 e-dump-regs/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 e-mem-sync/ > drwxrwxr-x 4 linaro linaro 4096 Mar 6 23:43 e-toggle-led/ > drwxrwxr-x 4 linaro linaro 4096 May 22 12:48 emesh_read_latency/ > drwxrwxr-x 4 linaro linaro 4096 May 22 12:48 emesh_traffic/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 erm/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 erm_example/ > drwxrwxr-x 4 linaro linaro 4096 Mar 6 23:42 fft2d/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 hardware_barrier/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 hardware_loops/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 hello_parallella/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 interrupts/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 link_lowpower_mode/ > drwxrwxr-x 4 linaro linaro 4096 Mar 7 02:04 matmul-16/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 mem_protect/ > drwxrwxr-x 4 linaro linaro 4096 May 23 13:26 mesh_bandwidth_all2one/ > drwxrwxr-x 4 linaro linaro 4096 May 22 12:42 mesh_bandwidth_bisection/ > drwxrwxr-x 4 linaro linaro 4096 May 22 12:41 mesh_bandwidth_neighbour/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 mutex/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 nested_interrupts/ > drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 register_test/ > drwxrwxr-x 4 linaro linaro 4096 May 22 12:07 remote_call/ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf