On 6/7/13 9:43 AM, Benjamin Smedberg wrote:
On 6/5/2013 8:42 PM, Patrick Walton wrote:
  * Gecko has mutable strings and this is bad for performance

I'd like to understand/challenge this statement. Is this bad for
performance because you have to check the extra SHARED flag on write?

So I thought about this some more, and I think I have a clearer description of the problem I perceive with Gecko's strings here. It's long; please bear with me.

Gecko's strings are not only mutable but there are lots of different ownership models: at least refcounted buffers, dependent strings, owned buffers, stack buffers. All of these are basically implemented as a single C++ class with various flag bits indicating exactly what its current ownership looks like. There are lots of subclasses of this class (nsDepenentString, nsAutoString, etc), but all they do is call its constructor with the right arguments. The constructors are out-of-line (though maybe that should be changed, since they're pretty simple).

Any mutation on Gecko's strings needs to ensure the string 1) actually has a unique copy of the data and 2) has sufficient memory in the buffer to do the mutation. So every mutation needs to go check the four possible flag values above, and do different things for the different cases to ensure the above two things. It's a lot of code, which means it's not inlined into the mutator (even just the basic "check whether we own this data and have enough capacity" code is rather big to inline into all the string mutators, of which Gecko has lots). So every mutation function ends up with an out-of-line function call to do all of that checking (and in practice I think it's more than one function call in there). Thus appending a char using Append() to a string that already has an owned buffer with enough capacity, which should in theory be pretty fast, is still rather slow.

Futhermore, the ownership model of a string can change during its lifetime. This includes things like a dependent string ending up with a refcounted buffer or whatnot (since nothing stops that from happening). As a result, the destructor of the shared string class is also rather complicated, and hence also out-of-line. But even when inlined it's pretty complicated.

The upshot of all that in the bindings cases I was dealing with is that this code:

  { nsDependentString foo(chars, length); }

ends up with out-of-line constructor and destructor, and the destructor has to do a nontrivial amount of work. In a perfect world, the constructor of a dependent string should just come down to storing the chars and length and its destructor should be empty, I would think.

I think it's possible to get to that point with the following:

1) A shared base class for all strings which does not allow mutation. This would just store the chars and length and maybe a flags field (to indicate things like "I'm null-terminated" as needed); its constructor would just set those fields and its destructor would be empty. This type would be used when you want to pass an immutable string argument, and any string could be passed to a function taking such an argument.

2) Subclasses that have different ownership models and allow mutations that make sense for their particular ownership models. The mutations could have much faster, and inlined, ownership/capacity checks. These would be used as return values for functions.

There are, of course, some drawbacks. Specifically, any function writing to a string needs to effectively declare in its signature what the ownership model of that string will be. And the different mutable strings end up with API duplication if they want to expose similar APIs. Though note that the "similar API" thing only applies to owned/stack/refcounted strings; dependent strings do not in fact want an Append(), though they may want a Rebind()... Also, it's not clear to me what the best way is of handling "copying" a string in this setup; it's possible that the flags on the base class should in fact indicate the ownership model at least to the extent that a shareable buffer can be shared as needed. In practice, that's how nsString behaves in Gecko-land: shares the incoming buffer if possible, else allocates a new shareable buffer and copies.

This is somewhat similar to the "mutable string that you can write to, then mark as immutable" approach but enforced at compile time by the type system: if you're a specific string type you're basically a stringbuilder, and if you're the shared base class you're immutable.

-Boris
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to