# How it worked until recently
Servo uses a crate called string-cache for string interning. It defines
an `Atom` type that represents a string (it dereferences to `&str`), but
it take 8 bytes of stack space (whereas `String` take three times that
on 64-bit systems) and is fast to compare for equality or to hash (only
comparing/hashing a u64 value).
The memory representation of Atom is one u64, whose lower 2 bits
indicate one of three variants:
* If the string is in a set known at compile-time, the upper 32 bits are
an index. (This is a "static atom".)
* Otherwise, if the string 7 bytes long or shorter, it is stored inline
in the upper 56 bits and its length is stored in the next 4 bits. (This
is an "inline atom".)
* Otherwise, a `String` is stored in a global hash map whose entries are
atomically reference-counted. The atom’s value is the address of the
corresponding entry, cast to a pointer when needed. Entries are
memory-aligned so that the lower bits of their address is zero. (This is
a "dynamic atom".)
Creating an `Atom` from `&str` involves hashing the string and has some
computational cost. For strings in the static set, the `atom!` macro
(used with a string literal: `atom!("foo")`) allows doing that
computation at compile-time. It can also be used as a pattern in a
`match` expression.
# Why we changed it
Adding a string to the static set (in order to use it with the `atom!`
macro, or just to avoid some dynamic memory allocations) required:
* Having a local fork of the string-cache repository
* Making the addition
* Making a pull request
* Having a reviewer approve it
* Waiting for Travis-CI to test it and homu to merge it
* Having a reviewer publish a new version of string-cache on crates.io
* Updating string-cache to the newly published version
In particular, I want to use static atoms for all CSS property names.
Doing so without changing string-cache would require contributors to go
through all this when adding support for a new CSS property in Servo.
This situation also forced non-Servo users of the crate to ship in their
binaries a large number of static strings they might not care about.
# What changed
The goal was to move the set of static strings out of the string-cache
crate, and allow users to define their own.
It turns out we even want multiple crates in the same project to define
their own static strings. For example, html5ever uses `atom!` for
element and attribute names, and Servo’s style crate would use them for
CSS property names, ideally with a static set generated automatically.
So from string-cache 0.3, in
https://github.com/servo/string-cache/pull/178 (based on initial work by
@aidanhs, thanks again!), we made the `string_cache::Atom` type generic.
It now takes a type parameter which provides the set of static strings.
This type is typically not used directly anymore. Instead, the
string_cache_codegen crate is intended to be used in a build script.
Given static strings, a type alias name, and a macro name, it generates
code that defines:
* The appropriate data structure and trait impl for the hash set of
static strings
* A type alias like `type Foo = string_cache::Atom<FooStaticSet>;`.
(This is the type that is typically used.)
* A macro like the former `atom!` that takes a string literal in the
static set and expands to a value of that type.
An important aspect is that atoms with different static sets are
different Rust types. One can not be used in place of the other
(conversion has to go through `&str`), and they can not be compared for
equality (but their dereferenced `&str`s can).
# How it’s used in Servo
https://github.com/servo/html5ever now has a new html5ever_atoms crate
that defines:
* Three atom types, each with known common values in their static sets:
`Prefix`, `Namespace`, and `LocalName`.
* Respective macros `namespace_prefix!`, `namespace_url!`, and
`local_name!`.
* The `ns!` macro and `QualName` type previously in string-cache.
Since https://github.com/servo/servo/pull/14043, the Servo repository
contains a servo_atoms crate in ./components/atoms that defines an
`Atom` type and corresponding `atom!` macro.
`servo_atoms::Atom` is now used for everything else (other than
prefixes, namespaces, element names, and content attribute names) that
was previously `string_cache::Atom`.
Note that the static set right now is just enough to make every usage of
`atom!` work. If you had added strings (such as common attribute values)
to string-cache’s static list purely to avoid dynamic atoms (with their
memory allocation and reference-counting cost), they are likely not
static anymore. (This is because I didn’t find a way to tell which item
of the old set should go into which new set.)
In the future, we can introduce more atom types as needed. (I’m planning
to do so for CSS property names, with the static set generated
automatically.)
# Stylo
Stylo is not affected by any of this. When built for stylo, the Prefix
and LocalName types used in selectors are conditionally-complied
re-exports of gecko_string_cache::Atom, which is unchanged.
--
Simon Sapin
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo