On Mon, Oct 8, 2012 at 10:26 PM, Vladimir Makarov <vmaka...@redhat.com> wrote: > I am not a fan of sbitmap for regular use.
Me neither, to be honest. (For the lra-eliminations.c bitmap it was a particularly bad choice :-) > This patch definitely helps for > this particular test. But it might hurt performance for small functions > because > > You need to allocate sbitmap (bitmap frequently uses already allocated pool > of bitmap elements), The insns stack is only allocated once per function, so I don't think there's any meaningful overhead in this case. > sometimes reallocate it (because new insns are added), I measured the number of times the lra_constraint_insn_stack_bitmap sbitmap had to be re-allocated. This happened at most once for functions with an INSN_UID greater than 127 (I chose this number because 128 bits is the number of bits a bitmap_element can hold). For smaller functions it happened more frequently. I think the re-allocation can be mostly avoided if I'd just pre-allocate with some slack (3/2*get_max_uid). > make it empty (zeroing), and it may hurt locality when the corresponding set > is partially sparsed (but not so sparsed as in this case, where each bit > practically requires a new bitmap element). The locality thing is true, but I doubt it matters much if you're looking at a bit when you're already going through chains and lists of RTL and other structures. Our bitmap_elements could be anywhere in memory, because the free-elements lists are not sorted to be contiguous in memory. Things improve a bit if you put a bitmap on its own obstack. If you have a function with a max. INSN_UID greater than 127 (almost all non-trivial functions) you already need two bitmap_elements to represent the stack membership. If you have >1000 insns, you'd need 8 bitmap_elements best-case, but my measurements indicate that you usually need at least twice that many because the INSN_UID sequences tend to be sparse (especially on targets with complex instructions (combine!) or many insn_splitters), and the INSN_UIDs in an insn sequence are usually not even close to sequential for a sufficiently large function (e.g. due to scheduling). So accessing the bitmap becomes a cache killer, even if the set is sparse. If allocated with a smarter strategy (e.g. pre-allocating some slack), the locality of an sbitmap may be even better than that of a bitmap_head. The zeroing is usually quite cheap (memset on contiguous memory) but not having to do so is obviously better. An sbitmap is still much cheaper than a SparseSet. Even a sparse sbitmap is often better than a sparse bitmap_head if you're doing random access on the set. If INSN_UIDs where less scattered, using a bitmap would probably be best. > So I checked it on big file with > hundred functionson Intel machine and got > > before a part of the patch implementing the insn stack as sbitmap > real=243.40 user=241.61 system=1.00 > > after the part of the patch: > real=244.89 user=243.02 system=0.96 Is that more than just noise, you think? A ~1.5s difference on ~240 total isn't very much. I measured the timings on my set of cc1-i files, and sometimes the without-patch compiler was faster by a tiny amount, and sometimes it was slower. Even on an average of 10 runs I really couldn't say that the patch was a win or loss on the whole. I measured this on a mostly idle machine at home, not gcc17, which seems to be even more busy than usual lately, thanks to you and me :-) > So I guess we need a combination of methods (based on sbitmap and bitmap) or > some another method could work in both cases. I think it can wait the next > stage. For this case I think we have no better set representation available than an sbitmap. But perhaps we need to think about this some more first... I was considering a pointer_set at first, but it looks like the order in which insns get visited is important and deleting an element from a pointer_set is not very efficient (the delete function doesn't exist yet, and the set representation is intentionally kept simple so it doesn't resize). Ciao! Steven