On Wed, Jul 11, 2018 at 01:49:04PM +0200, Jean-Yves Avenard wrote:
There’s one place where we could gain heaps is in the media stack.
Currently, each content process allocate a thread-pool with at least 8 threads for use with the media decoders, each threads a default stack size of 256kB.
(https://searchfox.org/mozilla-central/source/xpcom/threads/nsIThreadManager.idl#53)

That stack size has been increased over the years due to the growing use of either system frameworks (in particular the mac CoreVideo framework that use over 200kB alone), and right now 256kB itself isn’t enough for the new AV1 decoder from libaom.

One of the work the media team has started, is to have all those decoders run in a dedicated process: the reason for this work was mostly done for security reasons, but there will be side gains memory-wise.

This work is tracked in bug 1471535 (https://bugzilla.mozilla.org/show_bug.cgi?id=1471535)

Once this is done, and we no longer calls decoders in the content process, the decoder process could use an increase stack size, while reducing the content process default stack size to 128kB (and maybe even 64kB)

That alone may be sufficient to achieve your mentioned goals.

Thanks. Boris added this as a blocker.

It looks like it will be helpful, but unfortunately won't give us the 2MB simple arithmetic would suggest. On Windows, at least, (and probably elsewhere, but need to confirm) thread stacks are lazily committed, so as long as the decoders aren't used in a process, the overhead is probably closer to 25KB per thread.

Shrinking the size of the thread pool and lazily spinning up threads when they're first needed would probably save us 200KB per process, though...

An immediate intermediary step could be to use two different stack sizes as we 
pretty much know which one needs more over others.

JY


On 10 Jul 2018, at 8:19 pm, Kris Maglione <kmagli...@mozilla.com> wrote:

Welcome to the first edition of the Fission MemShrink newsletter.[1]

In this edition, I'll sum up what the project is, and why it matters to you. In 
subsequent editions, I'll give updates on progress that we've made, and areas 
that we'll need to focus on next.[2]


The Fission MemShrink project is one of the most easily overlooked aspects of 
Project Fission (also known as Site Isolation), but is absolutely critical to 
its success. And will require a company- and community-wide effort effort to 
meet its goals.

The problem is thus: In order for site isolation to work, we need to be able to 
run *at least* 100 content processes in an average Firefox session. Each of 
those processes has its own base memory overhead—memory we use just for 
creating the process, regardless of what's running in it. In the post-Fission 
world, that overhead needs to be less than 10MB per process in order to keep 
the extra overhead from Fission below 1GB. Right now, on our best-cast 
platform, Windows 10, is somewhere between 17 and 21MB. Linux and OS-X hover 
between 25 and 35MB. In other words, between 2 and 3.5GB for an ordinary 
session.

That means that, in the best case, we need to reduce the memory we use in 
content processes by *at least* 7MB. The problem, of course, is that there are 
only so many places we can cut memory without losing functionality, and even 
fewer places where we can make big wins. But, there are lots of places we can 
make small and medium-sized wins.

So, to put the task into perspective, of all of the places we can cut a certain 
amount of overhead, here are the number of each that we need to fix in order to 
reach 1MB:

250KB:   4
100KB:  10
75KB:   13
50KB:   20
20KB:   50
10KB:  100
5KB:   200

Now remember: we need to do *all* of these in order to reach our goal. It's not 
a matter of one 250KB improvement or 50 5KB improvements. It's 4 250KB *and* 
200 5KB improvements. There just aren't enough places we can cut 250KB. If we 
fall short in any of those areas, Project Fission will fail, and Firefox will 
be the only major browser without site isolation.

But it won't fail, because all of you are awesome, and this is a totally 
achievable goal if we all throw our effort behind it.

Essentially what this means, though, is that if we identify an area of overhead 
that's 50KB[3] or larger that can be eliminated, it *has* to be eliminated. 
There just aren't that many large chunks to remove. They all need to go. And if 
an area of code has a dozen 5KB chunks that can be eliminated, maybe they don't 
all have to go, but at least half of them do. The more the better.


To help us triage these issues, we have a tracking bug 
(https://bugzil.la/memshrink-content), and a per-bug whiteboard tag 
([overhead:...]) which gives an estimate of how much per-process overhead we 
believe fixing that bug would eliminate. Please feel free to add blockers to 
the tracking bug if you think they're relevant, and to add or update [overhead] 
tags if you have reasonable estimates.


With all of that said, here's a brief update of the progress we've made so far:

In the past month, unique memory per process[4] has dropped 3-4MB[5], and JS 
memory usage in particular has dropped 1.1-1.9MB.

Particular credit goes to:

* Eric Rahm added an AWSY test suite to track base content process memory
 (https://bugzil.la/1442361). Results:

  Resident unique: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684862,1,4&series=mozilla-central,1684846,1,4&series=mozilla-central,1685133,1,4&series=mozilla-central,1685127,1,4
  Explicit allocations: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1706218,1,4&series=mozilla-inbound,1706220,1,4&series=mozilla-inbound,1706216,1,4
  JS: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684866,1,4&series=mozilla-central,1685137,1,4&series=mozilla-central,1685131,1,4

* Andrew McCreight created a tool for tracking JS memory usage, and figuring
 out which scripts and objects are responsible for how much of it
 (https://bugzil.la/1463569).

* Andrew and Nika Layzell also completely rewrote the way we handle XPIDL type
 info so that it's statically compiled into the executable and shared between
 all processes (https://bugzil.la/1438688, https://bugzil.la/1444745).

* Felipe Gomes split a bunch of code out of frame scripts so that it could be
 lazily loaded only when needed (https://bugzil.la/1467278, ...) and added a
 whitelist of JSMs that are allowed to be loaded at content process startup
 (https://bugzil.la/1471066)

* I did a bit of this too, and also prevented us from loading some other JSMs
 before we need them (https://bugzil.la/1470333, https://bugzil.la/1469719,
 ...)

* Nick Nethercote made dynamic nsAtoms allocate their string storage inline
 rather than use a refcounted StringBuffer (https://bugzil.la/1447951)

* Emilio Álvarez reduced the amount of memory the Gecko Profiler uses in
 content processes.

* Nathan Froyd fixed our static nsAtom code so it didn't generate static
 initializers (https://bugzil.la/1455178) and reduced the stack size of our
 image decoder threads (https://bugzil.la/1443932).

* Doug Thayer reduced the number of hang monitor threads we start in each
 process (https://bugzil.la/1448040)

* Boris Zbarsky removed a bunch of useless QueryInterface implementations
 (https://bugzil.la/1452862), made our static isInstance methods use less
 memory (https://bugzil.la/1452786), and generally deleted a bunch of
 useless, legacy nsI* interfaces that required us to add extra vtable
 pointers to a lot of DOM object instances.

And your humble author contributed the following:

* Changed our localization string bundles to use shared memory for bundles
 which are loaded into content processes (https://bugzil.la/1470365).
 This bug also adds some helpers which should make it easer to use shared
 memory for more things in the future.

* Made some changes to the script preloader to avoid keeping an unnecessary
 encoded copy of scripts in the content process (https://bugzil.la/1470793),
 to drop cached single-use scripts (https://bugzil.la/1471091), and to improve
 the set of scripts we load in content processes (https://bugzil.la/1471089).

* Made some smaller optimizations to avoid making copies of strings in
 preference callbacks (https://bugzil.la/1472523), and to remove the XPC
 compilation scope (https://bugzil.la/1442737)

Apologies to anyone I missed.


[1]: Please feel free to read the '.' as a '!' if you're so inclined. I
   generally shy away from exclamation marks.
[2]: If this seems like a massive rip-off of Ehsan's Quantum Flow newsletter
   format, that's because it is. Thanks, Ehsan :)
[3]: 50KB per process, which is to say 5MB across 100 content processes.
[4]: The total memory mapped by each content process which is not shared by
   other processes. Approximately equal to USS.
[5]: It's hard to be precise, since the numbers can be noisy, and are often
   bi-modal.
_______________________________________________
firefox-dev mailing list
firefox-...@mozilla.org
https://mail.mozilla.org/listinfo/firefox-dev




--
Kris Maglione
Senior Firefox Add-ons Engineer
Mozilla Corporation

It's always good to take an orthogonal view of something.  It develops
ideas.
        --Ken Thompson

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to