On 4/9/07, J.C. Pizarro <[EMAIL PROTECTED]> wrote:
2007/4/9, Lawrence Crowl <[EMAIL PROTECTED]>:
> On 4/7/07, Joe Buck <[EMAIL PROTECTED]> wrote:
> > Consider an implementation that, when given
> >
> > Foo* array_of_foo = new Foo[n_elements];
> >
> > passes __compute_size(elements, sizeof Foo) instead of
> > n_elements*sizeof Foo to operator new, where __compute_size is
> >
> > inline size_t __compute_size(size_t num, size_t size) {
> > size_t product = num * size;
> > return product >= num ? product : ~size_t(0);
> > }
> >
> > This counts on the fact that any operator new implementation has to
> > fail when asked to supply every single addressible byte, less one.
>
> This statement is true only for linear address spaces. For segmented
> address spaces, it is quite feasible to have a ~size_t(0) much smaller
> than addressable memory.
We've working in linear address spaces.
How for segmented address spaces? You give me examples.
Intel has had several popular processors with segmented addresses
including the 8086, 80186, and 80286. (Actually, the 80386 and
successors are segmented, but the operating systems typically hide
that fact.) The also had the i432.
I think the IBM AS400 series may be segmented, though I'm not sure.
> The optimization above would be wrong for such machines because
> the allocation would be smaller than the requested size.
To request a size of ~size_t(0) is to request a size
of 0xFFFFFFFF or 0xFFFFFFFFFFFFFFFFULL that the allocator
will always "sorry, there is not memory of 0xFFFFFFFF or
0xFFFFFFFFFFFFFFFFULL bytes.
Except on a segmented machine, where it will say "here you go!"
On segmented architectures sizeof(size_t) < sizeof(void*).
> > It would appear that the extra cost, for the non-overflow case, is
> > two instructions (on most architectures): the compare and the
> > branch, which can be arranged so that the prediction is not-taken.
>
> That is the dynamic count. The static count, which could affect
> density of cache use, should also include the alternate return value.
With CoreDuo, the density of cache use is not our problem because there
are L2 caches of 2 MiB! 4 MiB! and even more 6 MiB!!!.
Our main problem is to reach the maximum performance for future days.
The L1 (or sometimes L0) caches are much smaller, and execution speed
can be affected by instruction cache pressure. One can avoid the effect,
but not without basic block outlining.
--
Lawrence Crowl