Eric Blake wrote: > Jim Meyering <jim <at> meyering.net> writes: ... > My code inspection didn't turn up obvious problems, but I have not yet tested > with USE_OBSTACK either, so I wouldn't be surprised to find bit rot. If we do > get it working, would it be worth making the use of obstacks a runtime > decision > via hash_tuning, rather than a compile-time decision?
Code size is one potential reason not to make it a run-time decision. Consider an application that doesn't otherwise use obstacks and whose hash table usage is unlikely to benefit from it. All they'd get is the code size penalty. > There might also be alternative implementations possible with better > performance. For example, instead of malloc'ing free entries one at a time > and > tracking bucket overflows as pointers to malloc'd objects, we could instead > malloc/realloc an array of overflows, with bucket overflows tracked as indices > into that array. This would provide some of the same benefits as obstack > allocation (no per-entry malloc overhead, and fewer calls to malloc and with > larger chunks of malloc claimed per call), such that it is no longer necessary > to use an obstack for free-entry management in the first place. I've found it useful to have one malloc'd/free'd buffer per entry when debugging memory corruption problems. For performance/production, I would use the obstack-based code, and when testing, I'd compile/test both with and without USE_OBSTACK. > Another hash implementation I am familiar with treats the slots array as a > circular array, and has no overflow linked lists (thus, the entire hash table > is contained within a single malloc'd block, rather than the hash.c > implementation of one block for bucket heads and other blocks for collisions). > Instead, the hasher merely determines where in the circular array to start > searching, and on collisions, you end up searching until you hit a free slot. > But this has other ramifications; searching for a match means traversing > through the array until you encounter an empty slot, and you no longer have > the > guarantee that the comparison function will always be called with two elements > with the same hash. Also, whereas linked lists only search the set of > collisions, a nearly full circular array may end up searching the bulk of the > table before it arrives at the next free entry. However, I have never > benchmarked this implementation to see how it fares when compared with the > more > traditional bucket of linked lists. Performance-improving patches are always welcome ;-) However, finding a sufficiently representative set of applications on which to perform benchmarks may be a challenge, considering the variety of hash functions and differing sizes of objects in the table. > We may also want to borrow from Bruno's idioms in the gl_list arena, where > hash_initialize is implemented as setting up a vtable optimized to an > appropriate implementation, such that you could then experiment with different > implementations by swapping the vtable at initialization time. > >> >> I've begun writing a test module, and may even find the time to test >> the USE_OBSTACK code, too. As you say, it also needs to be improved >> to handle allocation failure, probably the same way remove.c now does: >> >> http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a5111af33ea6a5d27 > > Unrelated to this topic, but in looking at that patch, I noticed you added the > following line: > > cleanup:; > > Any reason for the empty statement after the label? While it appears not to have been the case here, I've sometimes had to add a semicolon after a label's colon because otherwise the label would refer to a following declaration, and that's not valid C. A label must refer to a "statement", not a declaration, so if I always use a semicolon, I don't have to think about it. > Is this worth a syntax check in maint.mk? Might be hard to get right in code that allows decl-after-stmt.