On Fri, 2013-11-08 at 11:25 -0500, Neil Horman wrote:
> On Wed, Nov 06, 2013 at 12:07:38PM -0800, Joe Perches wrote:
> > On Wed, 2013-11-06 at 15:02 -0500, Neil Horman wrote:
> > > On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote:
> > []
> > > > __always_inline instead of inline
> > > > static __always_inline void prefetch_lines(const void *addr, size_t len)
> > > > {
> > > >         const void *end = addr + len;
> > > > ...
> > > > 
> > > > buff doesn't need a void * cast in prefetch_lines
> > > > 
> > > Actually I take back what I said here, we do need the cast, not for a 
> > > conversion
> > > from unsigned char * to void *, but rather to discard the const qualifier
> > > without making the compiler complain.
> > 
> > Not if the function is changed to const void *
> > and end is also const void * as shown.
> > 
> Addr is incremented in the for loop, so it can't be const.  I could add a loop
> counter variable on the stack, but that doesn't seem like it would help 
> anything

Perhaps you meant
        void * const addr;
but that's not what I wrote.

Let me know if this doesn't compile.
It does here...
---
 arch/x86/lib/csum-partial_64.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index 9845371..891194a 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -29,8 +29,15 @@ static inline unsigned short from32to16(unsigned a)
  * Things tried and found to not make it faster:
  * Manual Prefetching
  * Unrolling to an 128 bytes inner loop.
- * Using interleaving with more registers to break the carry chains.
  */
+
+static __always_inline void prefetch_lines(const void * addr, size_t len)
+{
+       const void *end = addr + len;
+       for (; addr < end; addr += cache_line_size())
+               asm("prefetch 0(%[buf])\n\t" : : [buf] "r" (addr));
+}
+
 static unsigned do_csum(const unsigned char *buff, unsigned len)
 {
        unsigned odd, count;
@@ -67,7 +74,9 @@ static unsigned do_csum(const unsigned char *buff, unsigned 
len)
                        /* main loop using 64byte blocks */
                        zero = 0;
                        count64 = count >> 3;
-                       while (count64) { 
+
+                       prefetch_lines(buff, min(len, cache_line_size() * 4u));
+                       while (count64) {
                                asm("addq 0*8(%[src]),%[res]\n\t"
                                    "adcq 1*8(%[src]),%[res]\n\t"
                                    "adcq 2*8(%[src]),%[res]\n\t"


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to