On 19/08/11 07:35, Alexander Kjeldaas wrote:


On 18 August 2011 23:38, Simon Marlow <marlo...@gmail.com
<mailto:marlo...@gmail.com>> wrote:

    On 18/08/11 11:47, Johan Tibell wrote:

        On Thu, Aug 18, 2011 at 12:43 PM, Alexander Kjeldaas
        <alexander.kjeld...@gmail.com
        <mailto:alexander.kjeld...@gmail.com>>  wrote:

            Unaligned word-sized loads work fine on x86, and this would
            be x86-64 only,
            or even Nehalem (and later) only.   Or,  from a cost
            perspective, it could
            be interesting for non-Nehalem as well, as RAM is (usually)
            the most
            expensive component when running a server.  (But anyways,
            developing
            features for pre-Nehalem is slightly pointless IMHO).
            So I guess the alignment would be per-architecture or a
            flag.  If it is
            currently hardcoded, it would have to be configurable.
            Alexander


        I think problem is that there's code that's shared between all
        platforms that rely on the alignment. This code would have to be
        parameterized by the platform. It's potentially a very big task.


    Right, it's a big job.  The word-sized unit of allocation is fairly
    deeply wired in.

    Still, it's an interesting idea, and not impossible.  Objects would
    have to start on a word boundary due to pointer tagging as Ben
    points out, but within an object smaller pointers could be used.
      But isn't it 48-bit pointers, not 40-bit? (48-bit pointers aren't
    quite so good, because a cons cell spills over into 3 words rather
    than squeezing into 2).


40 bit is 1TiB which should be fine for a limited subset of programs
(covering 99.99% ;-), right?

Well, 48 bits covers the whole virtual address space, whereas with 40 bits we'd have to add a base address and do something OS-specific to ensure that we don't use more than 40 bits of address space (right now we just take whatever memory mmap() gives us).

We can also load the pointer into the upper part of the register and do
a right shift.  That way we can optionally clear the tag bits in the
same operation, and instead use the data size modifier during the load.
(If the GHC wiki is correct in that pointer tags need to be cleared to
dereference a pointer, then there is no additional penalty for using a
shift instead in this case).

If we do the above, we also get one tag bit into the carry flag for
free, so if one of the tag bits are used more frequently than others,
put it at the highest tag bit.

Alternatively, add one extra tag bit so we have 4 of them.  Then shift
by 25 and the extra tag bit would be in CF at the cost of 0.5TiB memory
space.

It is easy to see that this can be expanded to 27 tag bits in the
extreme case at the cost of a MOV.

In 2013, the next Intel generation after Sandy Bridge will have a
3-argument BEXTR instruction.  In that case one can load the required
bits directly from memory into a register, so there's an upgrade path
here :-).  One will need a register pre-populated with the required
shift/extract information though.

We can actually have an arbitrary amount of aux material, we can have
variable-length tag bits.
If we load the pointer into the top of the register, we can always shift
by a fixed amount to dereference the pointer.  However, when we need to
access tag bits, we can reserve a tag pattern that indicates that there
are additional tag bytes following the 40-bit pointer that we can
inspect.  That should be pretty powerful because if allows for variable
length tag bits.

(Oh, and if we have variable-length tag bits, and some of the tag bits
are discardable, they could be discarded aggressively in the global heap
by the GC, but that would require lots of pointer fixups.. ouch)

Lots of ideas to ponder here. I'm usually pretty reluctant to do deep architecture-specific optimisation at the level of the execution model though. But there's plenty of scope for experimentation - do you plan to try any of this yourself?

Cheers,
        Simon

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to