On 18 August 2011 23:38, Simon Marlow <marlo...@gmail.com> wrote:

> On 18/08/11 11:47, Johan Tibell wrote:
>
>> On Thu, Aug 18, 2011 at 12:43 PM, Alexander Kjeldaas
>> <alexander.kjeld...@gmail.com>  wrote:
>>
>>> Unaligned word-sized loads work fine on x86, and this would be x86-64
>>> only,
>>> or even Nehalem (and later) only.   Or,  from a cost perspective, it
>>> could
>>> be interesting for non-Nehalem as well, as RAM is (usually) the most
>>> expensive component when running a server.  (But anyways, developing
>>> features for pre-Nehalem is slightly pointless IMHO).
>>> So I guess the alignment would be per-architecture or a flag.  If it is
>>> currently hardcoded, it would have to be configurable.
>>> Alexander
>>>
>>
>> I think problem is that there's code that's shared between all
>> platforms that rely on the alignment. This code would have to be
>> parameterized by the platform. It's potentially a very big task.
>>
>
> Right, it's a big job.  The word-sized unit of allocation is fairly deeply
> wired in.
>
> Still, it's an interesting idea, and not impossible.  Objects would have to
> start on a word boundary due to pointer tagging as Ben points out, but
> within an object smaller pointers could be used.  But isn't it 48-bit
> pointers, not 40-bit? (48-bit pointers aren't quite so good, because a cons
> cell spills over into 3 words rather than squeezing into 2).


40 bit is 1TiB which should be fine for a limited subset of programs
(covering 99.99% ;-), right?

And pointer loads would be slower, as we'd have to mask the upper bits.
>
>
We can also load the pointer into the upper part of the register and do a
right shift.  That way we can optionally clear the tag bits in the same
operation, and instead use the data size modifier during the load. (If the
GHC wiki is correct in that pointer tags need to be cleared to dereference a
pointer, then there is no additional penalty for using a shift instead in
this case).

If we do the above, we also get one tag bit into the carry flag for free, so
if one of the tag bits are used more frequently than others, put it at the
highest tag bit.

Alternatively, add one extra tag bit so we have 4 of them.  Then shift by 25
and the extra tag bit would be in CF at the cost of 0.5TiB memory space.

It is easy to see that this can be expanded to 27 tag bits in the extreme
case at the cost of a MOV.

In 2013, the next Intel generation after Sandy Bridge will have a 3-argument
BEXTR instruction.  In that case one can load the required bits directly
from memory into a register, so there's an upgrade path here :-).  One will
need a register pre-populated with the required shift/extract information
though.

We can actually have an arbitrary amount of aux material, we can have
variable-length tag bits.
If we load the pointer into the top of the register, we can always shift by
a fixed amount to dereference the pointer.  However, when we need to access
tag bits, we can reserve a tag pattern that indicates that there are
additional tag bytes following the 40-bit pointer that we can inspect.  That
should be pretty powerful because if allows for variable length tag bits.

(Oh, and if we have variable-length tag bits, and some of the tag bits are
discardable, they could be discarded aggressively in the global heap by the
GC, but that would require lots of pointer fixups.. ouch)

Alexander
_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to