The unaligned accesses in libpng are, for the large copies, a bug.   Our 
attempt to align the row buffer to a 16 byte boundary was off-by-one so we end 
up always mis-aligning it.  I've posted a patch on the png-mng-implement list:

http://sourceforge.net/mailarchive/message.php?msg_id=28194444

The time spent in memcpy() is probably an illusion.  The data out of zlib gets 
copied to one row buffer where it is unfiltered (if necessary) then a copy is 
made in a separate buffer that is only used for the filter handling.  If you 
test using images with large rows (I don't know what pngbench does) the copy 
buffer may well get flushed out of the second level cache between each row, 
then the memcpy will stall bringing it back in.

If you have machine level profiling you may see this as a massive time spike on 
some probably unrelated instruction which just happens to be in the PC when the 
stall stops everything.

Anyway, I have several ideas of how to avoid the copy when it isn't required.

John Bowler <jbow...@acm.org>

-----Original Message-----
From: Glenn Randers-Pehrson [mailto:glen...@gmail.com] 
Sent: Monday, October 03, 2011 1:15 PM
To: PNG/MNG implementation discussion list
Subject: [png-mng-implement] Use of memcpy() in libpng [Fwd from 
linaro-toolchain list]

Re: Use of memcpy() in libpng

David Gilbert
Tue, 27 Sep 2011 06:20:14 -0700

On 27 September 2011 14:16, Christian Robottom Reis <k...@linaro.org> wrote:
> On Tue, Sep 27, 2011 at 09:47:33AM +0100, Ramana Radhakrishnan wrote:
>> On 26 September 2011 21:51, Michael Hope <michael.h...@linaro.org> wrote:
>> > Saw this on the linaro-multimedia list:
>> >  
>> > http://lists.linaro.org/pipermail/linaro-multimedia/2011-September/
>> > 000074.html
>> >
>> > libpng spends a significant amount of time in memcpy().  This might 
>> > tie in with Ramana's investigation or the unaligned access work by 
>> > allowing more memcpy()s to be inlined.
>>
>> It's the unaligned access and the change / improvements to the memcpy 
>> that *might* help in this case. But that ofcourse depends on the 
>> compiler knowing when it can do such a thing. Ofcourse what might be 
>> more interesting is the kind of workload analysis that Dave's done in 
>> the past with memcpy to know what the alignment and size of the 
>> buffer being copied is.
>
> If you guys could take a look at this there is a potential requirement 
> for the MMWG around libpng optimization; we could fit this in along 
> with other work (possible vectorizing, etc) on that component.

It wouldn't take long to analyse the memcpy calls - life would be easier if we 
had the test program and some details on things like what size of images were 
used in these benchmarks.

Dave


_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to