On Thu, Jul 12, 2018 at 08:56:28AM -0700, Andrew McCreight wrote:
On Thu, Jul 12, 2018 at 3:57 AM, Emilio Cobos Álvarez <emi...@crisal.io>
wrote:

Thanks for doing this!

Just curious, is there a bug on file to measure excess capacity on
nsTArrays and hash tables?

njn looked at that kind of issue at some point (he changed how arrays grow,
for instance, to reduce overhead), but it has probably been around 5 years,
so there may be room for improvement for things added in the meanwhile.
However, our focus here is really on reducing per-process memory overhead,
rather than generic memory improvements, because we've had a lot of focus
on the latter as part of MemShrink, but not the former, so there's likely
easier improvements to be had.

I kind of suspect that improving the storage efficiency of hashtables (and probably nsTArrays too) will have an out-sized effect on per-process memory. Just at startup, for a mostly empty process, we have a huge amount of memory devoted to hashtables that would otherwise be shared across a bunch of origins—enough that removing just 4 bytes of padding per entry would save 87K per process. And that number tends to grow as we populate caches that we need for things like layout and atoms.

As much as I'd like to be able to share many of those caches between processes, there are always going to need process-specific hashtables on top of the shared ones for things that can't be/shouldn't be/aren't yet shared. And that extra overhead tends to grow proportionally to the number of processes we have.

On 07/10/2018 08:19 PM, Kris Maglione wrote:

Welcome to the first edition of the Fission MemShrink newsletter.[1]

In this edition, I'll sum up what the project is, and why it matters to
you. In subsequent editions, I'll give updates on progress that we've made,
and areas that we'll need to focus on next.[2]


The Fission MemShrink project is one of the most easily overlooked
aspects of Project Fission (also known as Site Isolation), but is
absolutely critical to its success. And will require a company- and
community-wide effort effort to meet its goals.

The problem is thus: In order for site isolation to work, we need to be
able to run *at least* 100 content processes in an average Firefox session.
Each of those processes has its own base memory overhead—memory we use just
for creating the process, regardless of what's running in it. In the
post-Fission world, that overhead needs to be less than 10MB per process in
order to keep the extra overhead from Fission below 1GB. Right now, on our
best-cast platform, Windows 10, is somewhere between 17 and 21MB. Linux and
OS-X hover between 25 and 35MB. In other words, between 2 and 3.5GB for an
ordinary session.

That means that, in the best case, we need to reduce the memory we use in
content processes by *at least* 7MB. The problem, of course, is that there
are only so many places we can cut memory without losing functionality, and
even fewer places where we can make big wins. But, there are lots of places
we can make small and medium-sized wins.

So, to put the task into perspective, of all of the places we can cut a
certain amount of overhead, here are the number of each that we need to fix
in order to reach 1MB:

250KB:   4
100KB:  10
75KB:   13
50KB:   20
20KB:   50
10KB:  100
5KB:   200

Now remember: we need to do *all* of these in order to reach our goal.
It's not a matter of one 250KB improvement or 50 5KB improvements. It's 4
250KB *and* 200 5KB improvements. There just aren't enough places we can
cut 250KB. If we fall short in any of those areas, Project Fission will
fail, and Firefox will be the only major browser without site isolation.

But it won't fail, because all of you are awesome, and this is a totally
achievable goal if we all throw our effort behind it.

Essentially what this means, though, is that if we identify an area of
overhead that's 50KB[3] or larger that can be eliminated, it *has* to be
eliminated. There just aren't that many large chunks to remove. They all
need to go. And if an area of code has a dozen 5KB chunks that can be
eliminated, maybe they don't all have to go, but at least half of them do.
The more the better.


To help us triage these issues, we have a tracking bug (
https://bugzil.la/memshrink-content), and a per-bug whiteboard tag
([overhead:...]) which gives an estimate of how much per-process overhead
we believe fixing that bug would eliminate. Please feel free to add
blockers to the tracking bug if you think they're relevant, and to add or
update [overhead] tags if you have reasonable estimates.


With all of that said, here's a brief update of the progress we've made
so far:

In the past month, unique memory per process[4] has dropped 3-4MB[5], and
JS memory usage in particular has dropped 1.1-1.9MB.

Particular credit goes to:

* Eric Rahm added an AWSY test suite to track base content process memory
   (https://bugzil.la/1442361). Results:

    Resident unique: https://treeherder.mozilla.org
/perf.html#/graphs?series=mozilla-central,1684862,1,4&series
=mozilla-central,1684846,1,4&series=mozilla-central,
1685133,1,4&series=mozilla-central,1685127,1,4
    Explicit allocations: https://treeherder.mozilla.org
/perf.html#/graphs?series=mozilla-inbound,1706218,1,4&series
=mozilla-inbound,1706220,1,4&series=mozilla-inbound,1706216,1,4
    JS: https://treeherder.mozilla.org/perf.html#/graphs?series=mozi
lla-central,1684866,1,4&series=mozilla-central,1685137,1,4&
series=mozilla-central,1685131,1,4

* Andrew McCreight created a tool for tracking JS memory usage, and
figuring
   out which scripts and objects are responsible for how much of it
   (https://bugzil.la/1463569).

* Andrew and Nika Layzell also completely rewrote the way we handle XPIDL
type
   info so that it's statically compiled into the executable and shared
between
   all processes (https://bugzil.la/1438688, https://bugzil.la/1444745).

* Felipe Gomes split a bunch of code out of frame scripts so that it
could be
   lazily loaded only when needed (https://bugzil.la/1467278, ...) and
added a
   whitelist of JSMs that are allowed to be loaded at content process
startup
   (https://bugzil.la/1471066)

* I did a bit of this too, and also prevented us from loading some other
JSMs
   before we need them (https://bugzil.la/1470333,
https://bugzil.la/1469719,
   ...)

* Nick Nethercote made dynamic nsAtoms allocate their string storage
inline
   rather than use a refcounted StringBuffer (https://bugzil.la/1447951)

* Emilio Álvarez reduced the amount of memory the Gecko Profiler uses in
   content processes.

* Nathan Froyd fixed our static nsAtom code so it didn't generate static
   initializers (https://bugzil.la/1455178) and reduced the stack size
of our
   image decoder threads (https://bugzil.la/1443932).

* Doug Thayer reduced the number of hang monitor threads we start in each
   process (https://bugzil.la/1448040)

* Boris Zbarsky removed a bunch of useless QueryInterface implementations
   (https://bugzil.la/1452862), made our static isInstance methods use
less
   memory (https://bugzil.la/1452786), and generally deleted a bunch of
   useless, legacy nsI* interfaces that required us to add extra vtable
   pointers to a lot of DOM object instances.

And your humble author contributed the following:

* Changed our localization string bundles to use shared memory for bundles
   which are loaded into content processes (https://bugzil.la/1470365).
   This bug also adds some helpers which should make it easer to use
shared
   memory for more things in the future.

* Made some changes to the script preloader to avoid keeping an
unnecessary
   encoded copy of scripts in the content process (
https://bugzil.la/1470793),
   to drop cached single-use scripts (https://bugzil.la/1471091), and to
improve
   the set of scripts we load in content processes (
https://bugzil.la/1471089).

* Made some smaller optimizations to avoid making copies of strings in
   preference callbacks (https://bugzil.la/1472523), and to remove the
XPC
   compilation scope (https://bugzil.la/1442737)

Apologies to anyone I missed.


[1]: Please feel free to read the '.' as a '!' if you're so inclined. I
     generally shy away from exclamation marks.
[2]: If this seems like a massive rip-off of Ehsan's Quantum Flow
newsletter
     format, that's because it is. Thanks, Ehsan :)
[3]: 50KB per process, which is to say 5MB across 100 content processes.
[4]: The total memory mapped by each content process which is not shared
by
     other processes. Approximately equal to USS.
[5]: It's hard to be precise, since the numbers can be noisy, and are
often
     bi-modal.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

--
Kris Maglione
Senior Firefox Add-ons Engineer
Mozilla Corporation

Most of the great triumphs and tragedies of history are caused not by
people being fundamentally good or fundamentally evil, but by people
being fundamentally people.
        --Terry Pratchett

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to