# How it worked until recently

Servo uses a crate called string-cache for string interning. It defines an `Atom` type that represents a string (it dereferences to `&str`), but it take 8 bytes of stack space (whereas `String` take three times that on 64-bit systems) and is fast to compare for equality or to hash (only comparing/hashing a u64 value).

The memory representation of Atom is one u64, whose lower 2 bits indicate one of three variants:

* If the string is in a set known at compile-time, the upper 32 bits are an index. (This is a "static atom".) * Otherwise, if the string 7 bytes long or shorter, it is stored inline in the upper 56 bits and its length is stored in the next 4 bits. (This is an "inline atom".) * Otherwise, a `String` is stored in a global hash map whose entries are atomically reference-counted. The atom’s value is the address of the corresponding entry, cast to a pointer when needed. Entries are memory-aligned so that the lower bits of their address is zero. (This is a "dynamic atom".)

Creating an `Atom` from `&str` involves hashing the string and has some computational cost. For strings in the static set, the `atom!` macro (used with a string literal: `atom!("foo")`) allows doing that computation at compile-time. It can also be used as a pattern in a `match` expression.


# Why we changed it

Adding a string to the static set (in order to use it with the `atom!` macro, or just to avoid some dynamic memory allocations) required:

* Having a local fork of the string-cache repository
* Making the addition
* Making a pull request
* Having a reviewer approve it
* Waiting for Travis-CI to test it and homu to merge it
* Having a reviewer publish a new version of string-cache on crates.io
* Updating string-cache to the newly published version

In particular, I want to use static atoms for all CSS property names. Doing so without changing string-cache would require contributors to go through all this when adding support for a new CSS property in Servo.

This situation also forced non-Servo users of the crate to ship in their binaries a large number of static strings they might not care about.


# What changed

The goal was to move the set of static strings out of the string-cache crate, and allow users to define their own.

It turns out we even want multiple crates in the same project to define their own static strings. For example, html5ever uses `atom!` for element and attribute names, and Servo’s style crate would use them for CSS property names, ideally with a static set generated automatically.

So from string-cache 0.3, in https://github.com/servo/string-cache/pull/178 (based on initial work by @aidanhs, thanks again!), we made the `string_cache::Atom` type generic. It now takes a type parameter which provides the set of static strings.

This type is typically not used directly anymore. Instead, the string_cache_codegen crate is intended to be used in a build script. Given static strings, a type alias name, and a macro name, it generates code that defines:

* The appropriate data structure and trait impl for the hash set of static strings * A type alias like `type Foo = string_cache::Atom<FooStaticSet>;`. (This is the type that is typically used.) * A macro like the former `atom!` that takes a string literal in the static set and expands to a value of that type.


An important aspect is that atoms with different static sets are different Rust types. One can not be used in place of the other (conversion has to go through `&str`), and they can not be compared for equality (but their dereferenced `&str`s can).


# How it’s used in Servo

https://github.com/servo/html5ever now has a new html5ever_atoms crate that defines:

* Three atom types, each with known common values in their static sets: `Prefix`, `Namespace`, and `LocalName`. * Respective macros `namespace_prefix!`, `namespace_url!`, and `local_name!`.
* The `ns!` macro and `QualName` type previously in string-cache.

Since https://github.com/servo/servo/pull/14043, the Servo repository contains a servo_atoms crate in ./components/atoms that defines an `Atom` type and corresponding `atom!` macro.

`servo_atoms::Atom` is now used for everything else (other than prefixes, namespaces, element names, and content attribute names) that was previously `string_cache::Atom`.

Note that the static set right now is just enough to make every usage of `atom!` work. If you had added strings (such as common attribute values) to string-cache’s static list purely to avoid dynamic atoms (with their memory allocation and reference-counting cost), they are likely not static anymore. (This is because I didn’t find a way to tell which item of the old set should go into which new set.)

In the future, we can introduce more atom types as needed. (I’m planning to do so for CSS property names, with the static set generated automatically.)


# Stylo

Stylo is not affected by any of this. When built for stylo, the Prefix and LocalName types used in selectors are conditionally-complied re-exports of gecko_string_cache::Atom, which is unchanged.


--
Simon Sapin
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to