This is awesome, and fixes a long-running pain point. Thanks Simon! On Thu, Nov 3, 2016 at 12:17 PM, Simon Sapin <simon.sa...@exyr.org> wrote:
> # How it worked until recently > > Servo uses a crate called string-cache for string interning. It defines an > `Atom` type that represents a string (it dereferences to `&str`), but it > take 8 bytes of stack space (whereas `String` take three times that on > 64-bit systems) and is fast to compare for equality or to hash (only > comparing/hashing a u64 value). > > The memory representation of Atom is one u64, whose lower 2 bits indicate > one of three variants: > > * If the string is in a set known at compile-time, the upper 32 bits are > an index. (This is a "static atom".) > * Otherwise, if the string 7 bytes long or shorter, it is stored inline in > the upper 56 bits and its length is stored in the next 4 bits. (This is an > "inline atom".) > * Otherwise, a `String` is stored in a global hash map whose entries are > atomically reference-counted. The atom’s value is the address of the > corresponding entry, cast to a pointer when needed. Entries are > memory-aligned so that the lower bits of their address is zero. (This is a > "dynamic atom".) > > Creating an `Atom` from `&str` involves hashing the string and has some > computational cost. For strings in the static set, the `atom!` macro (used > with a string literal: `atom!("foo")`) allows doing that computation at > compile-time. It can also be used as a pattern in a `match` expression. > > > # Why we changed it > > Adding a string to the static set (in order to use it with the `atom!` > macro, or just to avoid some dynamic memory allocations) required: > > * Having a local fork of the string-cache repository > * Making the addition > * Making a pull request > * Having a reviewer approve it > * Waiting for Travis-CI to test it and homu to merge it > * Having a reviewer publish a new version of string-cache on crates.io > * Updating string-cache to the newly published version > > In particular, I want to use static atoms for all CSS property names. > Doing so without changing string-cache would require contributors to go > through all this when adding support for a new CSS property in Servo. > > This situation also forced non-Servo users of the crate to ship in their > binaries a large number of static strings they might not care about. > > > # What changed > > The goal was to move the set of static strings out of the string-cache > crate, and allow users to define their own. > > It turns out we even want multiple crates in the same project to define > their own static strings. For example, html5ever uses `atom!` for element > and attribute names, and Servo’s style crate would use them for CSS > property names, ideally with a static set generated automatically. > > So from string-cache 0.3, in https://github.com/servo/strin > g-cache/pull/178 (based on initial work by @aidanhs, thanks again!), we > made the `string_cache::Atom` type generic. It now takes a type parameter > which provides the set of static strings. > > This type is typically not used directly anymore. Instead, the > string_cache_codegen crate is intended to be used in a build script. Given > static strings, a type alias name, and a macro name, it generates code that > defines: > > * The appropriate data structure and trait impl for the hash set of static > strings > * A type alias like `type Foo = string_cache::Atom<FooStaticSet>;`. (This > is the type that is typically used.) > * A macro like the former `atom!` that takes a string literal in the > static set and expands to a value of that type. > > > An important aspect is that atoms with different static sets are different > Rust types. One can not be used in place of the other (conversion has to go > through `&str`), and they can not be compared for equality (but their > dereferenced `&str`s can). > > > # How it’s used in Servo > > https://github.com/servo/html5ever now has a new html5ever_atoms crate > that defines: > > * Three atom types, each with known common values in their static sets: > `Prefix`, `Namespace`, and `LocalName`. > * Respective macros `namespace_prefix!`, `namespace_url!`, and > `local_name!`. > * The `ns!` macro and `QualName` type previously in string-cache. > > Since https://github.com/servo/servo/pull/14043, the Servo repository > contains a servo_atoms crate in ./components/atoms that defines an `Atom` > type and corresponding `atom!` macro. > > `servo_atoms::Atom` is now used for everything else (other than prefixes, > namespaces, element names, and content attribute names) that was previously > `string_cache::Atom`. > > Note that the static set right now is just enough to make every usage of > `atom!` work. If you had added strings (such as common attribute values) to > string-cache’s static list purely to avoid dynamic atoms (with their memory > allocation and reference-counting cost), they are likely not static > anymore. (This is because I didn’t find a way to tell which item of the old > set should go into which new set.) > > In the future, we can introduce more atom types as needed. (I’m planning > to do so for CSS property names, with the static set generated > automatically.) > > > # Stylo > > Stylo is not affected by any of this. When built for stylo, the Prefix and > LocalName types used in selectors are conditionally-complied re-exports of > gecko_string_cache::Atom, which is unchanged. > > > -- > Simon Sapin > _______________________________________________ > dev-servo mailing list > dev-servo@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-servo > _______________________________________________ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo