This is awesome, and fixes a long-running pain point. Thanks Simon!

On Thu, Nov 3, 2016 at 12:17 PM, Simon Sapin <simon.sa...@exyr.org> wrote:

> # How it worked until recently
>
> Servo uses a crate called string-cache for string interning. It defines an
> `Atom` type that represents a string (it dereferences to `&str`), but it
> take 8 bytes of stack space (whereas `String` take three times that on
> 64-bit systems) and is fast to compare for equality or to hash (only
> comparing/hashing a u64 value).
>
> The memory representation of Atom is one u64, whose lower 2 bits indicate
> one of three variants:
>
> * If the string is in a set known at compile-time, the upper 32 bits are
> an index. (This is a "static atom".)
> * Otherwise, if the string 7 bytes long or shorter, it is stored inline in
> the upper 56 bits and its length is stored in the next 4 bits. (This is an
> "inline atom".)
> * Otherwise, a `String` is stored in a global hash map whose entries are
> atomically reference-counted. The atom’s value is the address of the
> corresponding entry, cast to a pointer when needed. Entries are
> memory-aligned so that the lower bits of their address is zero. (This is a
> "dynamic atom".)
>
> Creating an `Atom` from `&str` involves hashing the string and has some
> computational cost. For strings in the static set, the `atom!` macro (used
> with a string literal: `atom!("foo")`) allows doing that computation at
> compile-time. It can also be used as a pattern in a `match` expression.
>
>
> # Why we changed it
>
> Adding a string to the static set (in order to use it with the `atom!`
> macro, or just to avoid some dynamic memory allocations) required:
>
> * Having a local fork of the string-cache repository
> * Making the addition
> * Making a pull request
> * Having a reviewer approve it
> * Waiting for Travis-CI to test it and homu to merge it
> * Having a reviewer publish a new version of string-cache on crates.io
> * Updating string-cache to the newly published version
>
> In particular, I want to use static atoms for all CSS property names.
> Doing so without changing string-cache would require contributors to go
> through all this when adding support for a new CSS property in Servo.
>
> This situation also forced non-Servo users of the crate to ship in their
> binaries a large number of static strings they might not care about.
>
>
> # What changed
>
> The goal was to move the set of static strings out of the string-cache
> crate, and allow users to define their own.
>
> It turns out we even want multiple crates in the same project to define
> their own static strings. For example, html5ever uses `atom!` for element
> and attribute names, and Servo’s style crate would use them for CSS
> property names, ideally with a static set generated automatically.
>
> So from string-cache 0.3, in https://github.com/servo/strin
> g-cache/pull/178 (based on initial work by @aidanhs, thanks again!), we
> made the `string_cache::Atom` type generic. It now takes a type parameter
> which provides the set of static strings.
>
> This type is typically not used directly anymore. Instead, the
> string_cache_codegen crate is intended to be used in a build script. Given
> static strings, a type alias name, and a macro name, it generates code that
> defines:
>
> * The appropriate data structure and trait impl for the hash set of static
> strings
> * A type alias like `type Foo = string_cache::Atom<FooStaticSet>;`. (This
> is the type that is typically used.)
> * A macro like the former `atom!` that takes a string literal in the
> static set and expands to a value of that type.
>
>
> An important aspect is that atoms with different static sets are different
> Rust types. One can not be used in place of the other (conversion has to go
> through `&str`), and they can not be compared for equality (but their
> dereferenced `&str`s can).
>
>
> # How it’s used in Servo
>
> https://github.com/servo/html5ever now has a new html5ever_atoms crate
> that defines:
>
> * Three atom types, each with known common values in their static sets:
> `Prefix`, `Namespace`, and `LocalName`.
> * Respective macros `namespace_prefix!`, `namespace_url!`, and
> `local_name!`.
> * The `ns!` macro and `QualName` type previously in string-cache.
>
> Since https://github.com/servo/servo/pull/14043, the Servo repository
> contains a servo_atoms crate in ./components/atoms that defines an `Atom`
> type and corresponding `atom!` macro.
>
> `servo_atoms::Atom` is now used for everything else (other than prefixes,
> namespaces, element names, and content attribute names) that was previously
> `string_cache::Atom`.
>
> Note that the static set right now is just enough to make every usage of
> `atom!` work. If you had added strings (such as common attribute values) to
> string-cache’s static list purely to avoid dynamic atoms (with their memory
> allocation and reference-counting cost), they are likely not static
> anymore. (This is because I didn’t find a way to tell which item of the old
> set should go into which new set.)
>
> In the future, we can introduce more atom types as needed. (I’m planning
> to do so for CSS property names, with the static set generated
> automatically.)
>
>
> # Stylo
>
> Stylo is not affected by any of this. When built for stylo, the Prefix and
> LocalName types used in selectors are conditionally-complied re-exports of
> gecko_string_cache::Atom, which is unchanged.
>
>
> --
> Simon Sapin
> _______________________________________________
> dev-servo mailing list
> dev-servo@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-servo
>
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to