https://bugs.kde.org/show_bug.cgi?id=79362
--- Comment #73 from Philippe Waroquiers <philippe.waroqui...@skynet.be> --- (In reply to Julian Seward from comment #72) > (In reply to Philippe Waroquiers from comment #71) > > Created attachment 107073 [details] > > (hack) : patch that adds measurement code to scan the EC for a .so unload > > + for (j = 0; j < ec->n_ips; j++) { > + if (UNLIKELY(ec->ips[j] >= from && ec->ips[j] <= to)) { > + break; > + } > + } > > This is a side-effect-free loop whose only computed value (j) is unused, > and provably terminates. I think it's likely that gcc noticed all 3 facts > and deleted the loop. Yes, you are correct. I will attach a new patch which acvoids the loop elimination. With this, the scan is slower : for big applications (260_000 EC, 6_300_000 IP), a scan takes between 0.020 to 0.030 seconds. So, an application that does 1000 load/unload will use around 20 seconds more cpu for the scanning. Assuming these measurements are now ok; I think this still looks acceptable, as: * a large majority of the users will not use this feature, so we better reduce the impact of this feature when not used (typically in memory) * for users of the functionality, we better make --track-origins functionally correct * probably not many applications are doing load/unload at a high frequency. If then we find an application that suffers heavily from this scannning, then we can always implement a 'lazy scan': instead of scanning all EC when a lib is unloaded, we just scan an EC when EC is symbolised (or when it is 're-acquired' following the capture of a new stack trace). The logic of this 'lazy scanning' will be: if EC epoch is not the current epoch then scan all DI that were unloaded between EC epoch and current epoch. if an address of the EC matches one of these DI, mark the EC as archived (i.e. it should not be used for something else than output) else change EC epoch to be current epoch With this, very few EC should have to be verified : only the 'active EC' and the EC related to an error. But I think we better do first the simple approach of scanning, and if a user of --keep-debuginfo complains about performance, optimise by doing the lazy scanning. --00:00:00:01.288 9148-- exectx: scanning 1000 times 515 contexts/5,133 ips --00:00:00:01.294 9148-- exectx: finished scanning. Match 4 scanned 5,133,000 515 contexts/5,133 ips --00:00:00:01.295 9151-- exectx: scanning 1000 times 1,027 contexts/11,277 ips --00:00:00:01.309 9151-- exectx: finished scanning. Match 4 scanned 11,277,000 1,027 contexts/11,277 ips --00:00:00:01.308 9154-- exectx: scanning 1000 times 2,051 contexts/24,589 ips --00:00:00:01.345 9154-- exectx: finished scanning. Match 4 scanned 24,589,000 2,051 contexts/24,589 ips --00:00:00:01.315 9157-- exectx: scanning 1000 times 4,099 contexts/53,261 ips --00:00:00:01.417 9157-- exectx: finished scanning. Match 4 scanned 53,261,000 4,099 contexts/53,261 ips --00:00:00:01.331 9160-- exectx: scanning 1000 times 8,195 contexts/114,701 ips --00:00:00:01.576 9160-- exectx: finished scanning. Match 4 scanned 114,701,000 8,195 contexts/114,701 ips --00:00:00:01.346 9163-- exectx: scanning 1000 times 16,387 contexts/245,773 ips --00:00:00:01.860 9163-- exectx: finished scanning. Match 4 scanned 245,773,000 16,387 contexts/245,773 ips --00:00:00:01.381 9166-- exectx: scanning 1000 times 32,771 contexts/524,301 ips --00:00:00:02.576 9166-- exectx: finished scanning. Match 4 scanned 524,301,000 32,771 contexts/524,301 ips --00:00:00:01.441 9169-- exectx: scanning 1000 times 65,539 contexts/1,114,125 ips --00:00:00:05.809 9169-- exectx: finished scanning. Match 4 scanned 1,114,125,000 65,539 contexts/1,114,125 ips --00:00:00:01.559 9172-- exectx: scanning 1000 times 131,076 contexts/2,359,327 ips --00:00:00:15.639 9172-- exectx: finished scanning. Match 4 scanned 2,359,327,000 131,076 contexts/2,359,327 ips --00:00:00:01.777 9175-- exectx: scanning 1000 times 262,149 contexts/4,980,787 ips --00:00:00:28.796 9175-- exectx: finished scanning. Match 4 scanned 4,980,787,000 262,149 contexts/4,980,787 ips --00:00:00:01.792 9178-- exectx: scanning 1000 times 262,149 contexts/5,242,933 ips --00:00:00:32.021 9178-- exectx: finished scanning. Match 4 scanned 5,242,933,000 262,149 contexts/5,242,933 ips --00:00:00:01.821 9182-- exectx: scanning 1000 times 262,149 contexts/6,291,514 ips --00:00:00:21.031 9182-- exectx: finished scanning. Match 524296 scanned 6,291,514,000 262,149 contexts/6,291,514 ips Note that I am slightly amazed by the fact that the last run is faster than the 2 previous one. I cannot explain this (I checked, and the Match-es are all because of the 'last ip' below main, which has a small value. So, the last run scans the same nr of EC, but about 1_000_000 IPs more (1000 times), and is faster ??? -- You are receiving this mail because: You are watching all bug changes.