Hi, I hope I'm not flooding with this topic. I've did some research and couldn't find anything relevant on this topic. My team is developing a large scale CAD application that has a large memory footprint, requiring strong machines to run.
The application uses pointers massively. During one of our optimization cycles I noticed that since most objects are aligned on 8-byte boundaries, it's possible to drop the lower 3 bits of the address and reconstruct the full address later. Basically, as long as the application is in the 32G range (2^32*2^3), it's possible to represent aligned pointers using an unsigned int - 4 bytes. Seeing as obtaining the address from the compressed representation only costs a left shift (which is very cheap), this trick will have an insignificant impact in highly polymorphic code (and perhaps even in general - I am not sure how well the compiler will optimize multiple calls to the same address using this mechanism). After giving this some thought, I realized 2 points: 1. The virtual function table pointer cannot be compressed without handling the compiler. 2. Spreading this (if the idea actually makes sense ;)) will also be easier through the compiler. 3. 32G of addressable range may not start at 0 and may be dispersed (although in reality sbrk usually starts a bit above 0 and grows continuously), so this may require some base address shifting/loader changes (??) Would be nice to get your opinions on it. If it makes any sense and you can give me some basic hints, I may be able to tailor it into the gcc development branch. In case I was unable to make sense above I'm attaching a simple implementation I did just to check the concept. :) Thanks for your time, Yair --------------------------------------------------------------------------------------------------------------------------------- #ifndef __ALIGNED_PTR_H__ #define __ALIGNED_PTR_H__ // An memory-efficient pointer implementation. // // Generally, pointers enable access at byte-level granularity. // 64-bit pointers are useful to enable access to 2^64 unique byte addresses, // which is useful for applications with a large memory foot-print. // // When byte-level granularity is not needed (example: some allocators return // addresses aligned to sizeof(void*)), it is possible to address >4GB using // a 32-bit value. // // The implementation below assumes the allocator's alignment is 2^ALIGNMENT_BITS, // and thus access to pointers only requires shifting the stored unsigned integer left, // which is faster than multiplication which would otherwise be necessary. // // For example, if ALIGNMENT_BITS = 3, the actual alignment is 2^3 (8), // which provides access to addresses up to 2^35 (32GB). // // Note that if unaligned addresses, or addresses farther than the allowed limit, // are sent to aligned_ptr it will assert, and so while the user will have to // rerun with full 64-bit pointers, there is no risk of memory corruption. // // Yair Lifshitz, June 2008 :) #include "assertions.h" extern unsigned long __aligned_ptr_malloc_base__; template <class T, unsigned int ALIGNMENT_BITS = 3> class aligned_ptr { public: typedef aligned_ptr<T, ALIGNMENT_BITS> self_type; aligned_ptr(): m_ptr(0) {} aligned_ptr(T val) { assert(is_aligned_ptr(val)); m_ptr = remove_base(val) >> ALIGNMENT_BITS; } T operator-> () { return ptr(); } const T operator->() const { return ptr(); } T ptr() { unsigned long ptr = m_ptr << ALIGNMENT_BITS; return reinterpret_cast<T>(ptr); } T ptr() const { unsigned long ptr = m_ptr << ALIGNMENT_BITS; return reinterpret_cast<T>(ptr); } operator T () const {return ptr();} self_type& operator= (T val) { *this = self_type(val); return *this; } static bool is_aligned_ptr(T val) { unsigned long val_reffed_to_base = remove_base(val); unsigned long ALIGNMENT_MASK = (1 << ALIGNMENT_BITS) - 1; if ((val_reffed_to_base & ALIGNMENT_MASK) != 0) return false; if (val_reffed_to_base >= ((unsigned long)1 << (sizeof(unsigned int)*8))) return false; return true; } private: static unsigned long remove_base(const T val) { return (unsigned long)val - __aligned_ptr_malloc_base__; } unsigned int m_ptr; }; #endif