Hi,

I hope I'm not flooding with this topic. I've did some research and
couldn't find anything relevant on this topic.
My team is developing a large scale CAD application that has a large
memory footprint, requiring strong machines to run.

The application uses pointers massively.
During one of our optimization cycles I noticed that since most
objects are aligned on 8-byte boundaries, it's possible to drop the
lower 3 bits of the address and reconstruct the full address later.

Basically, as long as the application is in the 32G range (2^32*2^3),
it's possible to represent aligned pointers using an unsigned int - 4
bytes.
Seeing as obtaining the address from the compressed representation
only costs a left shift (which is very cheap), this trick will have an
insignificant impact in highly polymorphic code (and perhaps even in
general - I am not sure how well the compiler will optimize multiple
calls to the same address using this mechanism).

After giving this some thought, I realized 2 points:

1. The virtual function table pointer cannot be compressed without
handling the compiler.
2. Spreading this (if the idea actually makes sense ;)) will also be
easier through the compiler.
3. 32G of addressable range may not start at 0 and may be dispersed
(although in reality sbrk usually starts a bit above 0 and grows
continuously), so this may require some base address shifting/loader
changes (??)

Would be nice to get your opinions on it.
If it makes any sense and you can give me some basic hints, I may be
able to tailor it into the gcc development branch.

In case I was unable to make sense above I'm attaching a simple
implementation I did just to check the concept. :)

Thanks for your time,

Yair

---------------------------------------------------------------------------------------------------------------------------------



#ifndef __ALIGNED_PTR_H__
#define __ALIGNED_PTR_H__

// An memory-efficient pointer implementation.
//
// Generally, pointers enable access at byte-level granularity.
// 64-bit pointers are useful to enable access to 2^64 unique byte addresses,
// which is useful for applications with a large memory foot-print.
//
// When byte-level granularity is not needed (example: some allocators return
// addresses aligned to sizeof(void*)), it is possible to address >4GB using
// a 32-bit value.
//
// The implementation below assumes the allocator's alignment is
2^ALIGNMENT_BITS,
// and thus access to pointers only requires shifting the stored
unsigned integer left,
// which is faster than multiplication which would otherwise be necessary.
//
// For example, if ALIGNMENT_BITS = 3, the actual alignment is 2^3 (8),
// which provides access to addresses up to 2^35 (32GB).
//
// Note that if unaligned addresses, or addresses farther than the
allowed limit,
// are sent to aligned_ptr it will assert, and so while the user will have to
// rerun with full 64-bit pointers, there is no risk of memory corruption.
//
// Yair Lifshitz, June 2008 :)

#include "assertions.h"

extern unsigned long __aligned_ptr_malloc_base__;

template <class T, unsigned int ALIGNMENT_BITS = 3>
class aligned_ptr
{
 public:

  typedef aligned_ptr<T, ALIGNMENT_BITS> self_type;

  aligned_ptr(): m_ptr(0) {}

  aligned_ptr(T val)
  {
    assert(is_aligned_ptr(val));
    m_ptr = remove_base(val) >> ALIGNMENT_BITS;
  }

  T operator-> () {
      return ptr();
  }

  const T operator->() const {
      return ptr();
  }

  T ptr() {
      unsigned long ptr = m_ptr << ALIGNMENT_BITS;
      return reinterpret_cast<T>(ptr);
  }

  T ptr() const {
      unsigned long ptr = m_ptr << ALIGNMENT_BITS;
      return reinterpret_cast<T>(ptr);
  }

  operator T () const {return ptr();}

  self_type& operator= (T val)
  {
      *this = self_type(val);
      return *this;
  }

  static bool is_aligned_ptr(T val)
  {
      unsigned long val_reffed_to_base = remove_base(val);
      unsigned long ALIGNMENT_MASK = (1 << ALIGNMENT_BITS) - 1;
      if ((val_reffed_to_base & ALIGNMENT_MASK) != 0) return false;
      if (val_reffed_to_base >= ((unsigned long)1 << (sizeof(unsigned
int)*8))) return false;
      return true;
  }

private:

    static unsigned long remove_base(const T val)
    {
        return (unsigned long)val - __aligned_ptr_malloc_base__;
    }

   unsigned int m_ptr;
};

#endif

Reply via email to