Hi, Currently we have two similar but distinct approaches to measuring memory usage in Servo.
- Servo uses the heapsize and heapsize_derive crates, from crates.io. - Gecko uses the malloc_size_of and malloc_size_of_derive crates, which are in the tree. Because of this, you see this pattern quite a bit in style code: > #[cfg_attr(feature = "gecko", derive(MallocSizeOf))] > #[cfg_attr(feature = "servo", derive(HeapSizeOf))] > struct Foo { > ... > } Why the difference? heapsize is the original approach. It sorta works, but has some big flaws. malloc_size_of is a redesign that addresses these flaws. - heapsize assumes you only want a single number for the size of any data structure, when sometimes you want to break it down into different buckets. malloc_size_of provides both "shallow" and "deep" measurements, which give greater flexibilty, which helps with the multi-bucket case. - heapsize assumes jemalloc is the allocator. This causes build problems in some configurations, e.g. https://github.com/servo/heapsize/issues/80. It also means it doesn't integrate with DMD, which is the tool we use to identify heap-unclassified memory in Firefox. malloc_size_of doesn't assume a particular allocator. You pass in functions that measure heap allocations. This avoids the build problems and also allows integration with DMD. - heapsize doesn't measure HashMap/HashSet properly -- it computes an estimate of the size, instead of getting the true size from the allocator. This estimate can be (and in practice often will be) an underestimate. malloc_size_of does measure HashMap/HashSet properly. However, this requires that the allocator provide a function that can measure the size of an allocation from an interior pointer. (Unlike Vec, HashMap/HashSet don't provide a function that gives the raw pointer to the storage.) I had to add support for this to mozjemalloc, and vanilla jemalloc doesn't support it. (I guess we could fall back to computing the size when the allocator doesn't support this, e.g. for Servo, which uses vanilla jemalloc.) - heapsize doesn't measure Rc/Arc properly -- currently it just defaults to measuring through the Rc/Arc, which can lead to double-counting. Especially when you use derive, where it's easy to overlook that Rc/Arc typically need special handling. (This is https://github.com/servo/heapsize/issues/37.) malloc_size_of does measure Rc/Arc properly. It lets you provide a table that tracks which pointers have already been measured, which is used to prevent double-counting. malloc_size_of also doesn't implement the standard MallocSizeOf trait for Rc/Arc, which means they can't be unintentionally measured via derive. (You can still use derive if you explicitly choose to ignore all Rc/Arc fields, however.) Basically, malloc_size_of is heapsize done right, and using both is silly. I went with this dual-track approach while adding memory reporting to Stylo because time was tight and the exact design choices required to handle all the necessary cases weren't clear. But now that things have settled down I'd like to pay back this technical debt by removing the duplication. I see two options. - Overwrite the heapsize crate on crates.io with the malloc_size_of code. So the crate name wouldn't change, but the API would change significantly, and it would still be on crates.io. Then switch Servo over to using heapsize everywhere. - Switch Servo over to using malloc_size_of everywhere. (This leaves open the question of what should happen to the heapsize crate.) I personally prefer the second option, mostly because I view all of this code as basically unstable -- much like the allocator APIs in Rust itself -- and publishing it on crates.io makes me uneasy. Also, keeping the code in the tree makes it easier to modify. Thoughts? Nick _______________________________________________ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo