Andrew Haley writes: > Jakub Jelinek writes: > > > > > While I still like using dl_iterate_phdr instead of > > > > __register_frame_info_bases for totally aesthetic reasons, there > > > > have been changes made to the dl_iterate_phdr interface since the > > > > gcc support was written that would allow the dl_iterate_phdr > > > > results to be cached. > > > > > > That would be nice. Also, we could fairly easily build a tree of > > > nodes, one for each loaded object, then we wouldn't be doing a linear > > > search through them. We could do that lazily, so it wouldn't kick in > > > 'til needed. > > > > Here is a rough patch for what you can do. > > Thanks very much. I'm working on it.
OK, I've roughed out a very simple patch and it certainly seems to improve things. Here's the before: samples cum. samples % cum. % app name symbol name 17962 17962 25.8164 25.8164 libgcc_s.so.1 _Unwind_IteratePhdrCallback 7019 24981 10.0882 35.9046 libc-2.3.3.so dl_iterate_phdr 6966 31947 10.0121 45.9167 libgcc_s.so.1 read_encoded_value_with_base 3756 35703 5.3984 51.3151 libgcj.so.6.0.0 GC_mark_from 3643 39346 5.2360 56.5511 libgcc_s.so.1 search_object 2032 41378 2.9205 59.4717 libgcc_s.so.1 __i686.get_pc_thunk.bx 1555 42933 2.2350 61.7066 libgcj.so.6.0.0 _Jv_MonitorExit 1413 44346 2.0309 63.7375 libgcj.so.6.0.0 _Jv_MonitorEnter 1288 45634 1.8512 65.5887 libgcj.so.6.0.0 java::util::IdentityHashMap::hash(java::lang::Object*) And here's the after: samples cum. samples % cum. % app name symbol name 7020 7020 14.7674 14.7674 libgcc_s.so.1 read_encoded_value_with_base 3808 10828 8.0106 22.7780 libgcc_s.so.1 _Unwind_IteratePhdrCallback 3680 14508 7.7413 30.5194 libgcj.so.6.0.0 GC_mark_from 3463 17971 7.2849 37.8042 libgcc_s.so.1 search_object 1587 19558 3.3385 41.1427 libgcj.so.6.0.0 _Jv_MonitorExit 1577 21135 3.3174 44.4601 libc-2.3.3.so dl_iterate_phdr 1288 22423 2.7095 47.1696 libgcj.so.6.0.0 _Jv_MonitorEnter 1230 23653 2.5875 49.7570 libgcj.so.6.0.0 java::util::IdentityHashMap::hash(java::lang::Object*) So, the time spent unwinding before was about 50% of the total runtime, and after about 28%. I measured the a miss rate of 0.006% with 27 entries used. Still, 28% is a heavy overhead. I think it's because we're doing a great deal of class lookups, and that does a stack trace as a security check. I'll look at caching secirity contexts in libgcj. Andrew.