Re: [lra] 3rd patch to speed more compilation of PR54146

Steven Bosscher Mon, 08 Oct 2012 14:15:16 -0700

On Mon, Oct 8, 2012 at 10:26 PM, Vladimir Makarov <vmaka...@redhat.com> wrote:
> I am not a fan of sbitmap for regular use.


Me neither, to be honest. (For the lra-eliminations.c bitmap it was a
particularly bad choice :-)


>  This patch definitely helps for
> this particular test.  But it might hurt performance for small functions
> because
>
> You need to allocate sbitmap (bitmap frequently uses already allocated pool
> of bitmap elements),

The insns stack is only allocated once per function, so I don't think
there's any meaningful overhead in this case.


> sometimes reallocate it (because new insns are added),

I measured the number of times the lra_constraint_insn_stack_bitmap
sbitmap had to be re-allocated. This happened at most once for
functions with an INSN_UID greater than 127 (I chose this number
because 128 bits is the number of bits a bitmap_element can hold). For
smaller functions it happened more frequently. I think the
re-allocation can be mostly avoided if I'd just pre-allocate with some
slack (3/2*get_max_uid).


> make it empty (zeroing), and it may hurt locality when the corresponding set
> is partially sparsed (but not so sparsed as in this case, where each bit
> practically requires a new bitmap element).

The locality thing is true, but I doubt it matters much if you're
looking at a bit when you're already going through chains and lists of
RTL and other structures. Our bitmap_elements could be anywhere in
memory, because the free-elements lists are not sorted to be
contiguous in memory. Things improve a bit if you put a bitmap on its
own obstack. If you have a function with a max. INSN_UID greater than
127 (almost all non-trivial functions) you already need two
bitmap_elements to represent the stack membership. If you have >1000
insns, you'd need 8 bitmap_elements best-case, but my measurements
indicate that you usually need at least twice that many because the
INSN_UID sequences tend to be sparse (especially on targets with
complex instructions (combine!) or many insn_splitters), and the
INSN_UIDs in an insn sequence are usually not even close to sequential
for a sufficiently large function (e.g. due to scheduling). So
accessing the bitmap becomes a cache killer, even if the set is
sparse.

If allocated with a smarter strategy (e.g. pre-allocating some slack),
the locality of an sbitmap may be even better than that of a
bitmap_head.

The zeroing is usually quite cheap (memset on contiguous memory) but
not having to do so is obviously better. An sbitmap is still much
cheaper than a SparseSet. Even a sparse sbitmap is often better than a
sparse bitmap_head if you're doing random access on the set. If
INSN_UIDs where less scattered, using a bitmap would probably be best.


> So I checked it on big file with > hundred functionson Intel machine and got
>
> before a part of the patch implementing the insn stack as sbitmap
> real=243.40 user=241.61 system=1.00
>
> after the part of the patch:
> real=244.89 user=243.02 system=0.96

Is that more than just noise, you think? A ~1.5s difference on ~240
total isn't very much. I measured the timings on my set of cc1-i
files, and sometimes the without-patch compiler was faster by a tiny
amount, and sometimes it was slower. Even on an average of 10 runs I
really couldn't say that the patch was a win or loss on the whole.

I measured this on a mostly idle machine at home, not gcc17, which
seems to be even more busy than usual lately, thanks to you and me :-)


> So I guess we need a combination of methods (based on sbitmap and bitmap) or
> some another method could work in both cases.  I think it can wait the next
> stage.

For this case I think we have no better set representation available
than an sbitmap. But perhaps we need to think about this some more
first...

I was considering a pointer_set at first, but it looks like the order
in which insns get visited is important and deleting an element from a
pointer_set is not very efficient (the delete function doesn't exist
yet, and the set representation is intentionally kept simple so it
doesn't resize).

Ciao!
Steven

Re: [lra] 3rd patch to speed more compilation of PR54146

Reply via email to