[ 
https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346793#comment-14346793
 ] 

Edward Nevill commented on HADOOP-11660:
----------------------------------------

Hi,

I did some additional benchmarking on pipelining on aarch64 and it would appear 
that I was mistaken. AArch64 is in fact capable of pipelining. The reason I was 
not seeing any improvement was that I choose too large a buffer size. In my 
test I was doing a CRC of 3 x 1MB buffers x 5000 times. The pipeline version 
shows worse performance because it is doing all three 1MB buffers in parallel 
whereas the non pipeline version processes the 1MB buffers individually which 
is more cache efficient.

I reduced the buffer size to 16KB and increased the no. of iterations from 5000 
to 500000. This generated the following results.

{code}
NON PIPELINED crc1 = 783797200, crc2 = 610683550, crc3 = -1644088667
time = 6.16
PIPELINED crc1 = -2031343782, crc2 = -2043588942, crc3 = 554161471
time = 4.61
{code}

I then replaced the CRC instruction with and ADD instruction (which always 
completes in 1 cycle) and got the following result.

{code}
NON PIPELINED crc1 = -1928994468, crc2 = -1747836272, crc3 = -674545616
time = 4.13
PIPELINED crc1 = -2096826240, crc2 = -334553600, crc3 = 1911147008
time = 4.19
{code}

This clearly shows that the CRC is pipelined because it is effectively able to 
complete each CRC in a single cycle (because the pipeline version gets the same 
performance as using ADD).

My bad. I will submit another patch within the next couple of days which 
includes pipelining. Thanks for your patience!

Ed.


> Add support for hardware crc on ARM aarch64 architecture
> --------------------------------------------------------
>
>                 Key: HADOOP-11660
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11660
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 3.0.0
>         Environment: ARM aarch64 development platform
>            Reporter: Edward Nevill
>            Assignee: Edward Nevill
>            Priority: Minor
>              Labels: performance
>         Attachments: jira-11660.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This patch adds support for hardware crc for ARM's new 64 bit architecture
> The patch is completely conditionalized on __aarch64__
> I have only added support for the non pipelined version as I benchmarked the 
> pipelined version on aarch64 and it showed no performance improvement.
> The aarch64 version supports both Castagnoli and Zlib CRCs as both of these 
> are supported on ARM aarch64 hardwre.
> To benchmark this I modified the test_bulk_crc32 test to print out the time 
> taken to CRC a 1MB dataset 1000 times.
> Before:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> After:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> So this represents a 5X performance improvement on raw CRC calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to