Hi,

Currently we have two similar but distinct approaches to measuring memory
usage
in Servo.

- Servo uses the heapsize and heapsize_derive crates, from crates.io.

- Gecko uses the malloc_size_of and malloc_size_of_derive crates, which are
in
  the tree.

Because of this, you see this pattern quite a bit in style code:

> #[cfg_attr(feature = "gecko", derive(MallocSizeOf))]
> #[cfg_attr(feature = "servo", derive(HeapSizeOf))]
> struct Foo {
>     ...
> }

Why the difference? heapsize is the original approach. It sorta works, but
has
some big flaws. malloc_size_of is a redesign that addresses these flaws.

- heapsize assumes you only want a single number for the size of any data
  structure, when sometimes you want to break it down into different
buckets.

  malloc_size_of provides both "shallow" and "deep" measurements, which give
  greater flexibilty, which helps with the multi-bucket case.

- heapsize assumes jemalloc is the allocator. This causes build problems in
  some configurations, e.g. https://github.com/servo/heapsize/issues/80.
  It also means it doesn't integrate with DMD, which is the tool we use to
  identify heap-unclassified memory in Firefox.

  malloc_size_of doesn't assume a particular allocator. You pass in
functions
  that measure heap allocations. This avoids the build problems and also
allows
  integration with DMD.

- heapsize doesn't measure HashMap/HashSet properly -- it computes an
estimate
  of the size, instead of getting the true size from the allocator. This
  estimate can be (and in practice often will be) an underestimate.

  malloc_size_of does measure HashMap/HashSet properly. However, this
requires
  that the allocator provide a function that can measure the size of an
  allocation from an interior pointer. (Unlike Vec, HashMap/HashSet don't
  provide a function that gives the raw pointer to the storage.) I had to
add
  support for this to mozjemalloc, and vanilla jemalloc doesn't support it.
(I
  guess we could fall back to computing the size when the allocator doesn't
  support this, e.g. for Servo, which uses vanilla jemalloc.)

- heapsize doesn't measure Rc/Arc properly -- currently it just defaults to
  measuring through the Rc/Arc, which can lead to double-counting.
Especially
  when you use derive, where it's easy to overlook that Rc/Arc typically
need
  special handling. (This is https://github.com/servo/heapsize/issues/37.)

  malloc_size_of does measure Rc/Arc properly. It lets you provide a table
that
  tracks which pointers have already been measured, which is used to prevent
  double-counting. malloc_size_of also doesn't implement the standard
  MallocSizeOf trait for Rc/Arc, which means they can't be unintentionally
  measured via derive. (You can still use derive if you explicitly choose to
  ignore all Rc/Arc fields, however.)

Basically, malloc_size_of is heapsize done right, and using both is silly. I
went with this dual-track approach while adding memory reporting to Stylo
because time was tight and the exact design choices required to handle all
the
necessary cases weren't clear. But now that things have settled down I'd
like
to pay back this technical debt by removing the duplication.

I see two options.

- Overwrite the heapsize crate on crates.io with the malloc_size_of code. So
  the crate name wouldn't change, but the API would change significantly,
and
  it would still be on crates.io. Then switch Servo over to using heapsize
  everywhere.

- Switch Servo over to using malloc_size_of everywhere. (This leaves open
the
  question of what should happen to the heapsize crate.)

I personally prefer the second option, mostly because I view all of this
code
as basically unstable -- much like the allocator APIs in Rust itself -- and
publishing it on crates.io makes me uneasy. Also, keeping the code in the
tree
makes it easier to modify.

Thoughts?

Nick
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to