On 6/7/13 9:43 AM, Benjamin Smedberg wrote:
On 6/5/2013 8:42 PM, Patrick Walton wrote:
* Gecko has mutable strings and this is bad for performance
I'd like to understand/challenge this statement. Is this bad for
performance because you have to check the extra SHARED flag on write?
So I thought about this some more, and I think I have a clearer
description of the problem I perceive with Gecko's strings here. It's
long; please bear with me.
Gecko's strings are not only mutable but there are lots of different
ownership models: at least refcounted buffers, dependent strings, owned
buffers, stack buffers. All of these are basically implemented as a
single C++ class with various flag bits indicating exactly what its
current ownership looks like. There are lots of subclasses of this
class (nsDepenentString, nsAutoString, etc), but all they do is call its
constructor with the right arguments. The constructors are out-of-line
(though maybe that should be changed, since they're pretty simple).
Any mutation on Gecko's strings needs to ensure the string 1) actually
has a unique copy of the data and 2) has sufficient memory in the buffer
to do the mutation. So every mutation needs to go check the four
possible flag values above, and do different things for the different
cases to ensure the above two things. It's a lot of code, which means
it's not inlined into the mutator (even just the basic "check whether we
own this data and have enough capacity" code is rather big to inline
into all the string mutators, of which Gecko has lots). So every
mutation function ends up with an out-of-line function call to do all of
that checking (and in practice I think it's more than one function call
in there). Thus appending a char using Append() to a string that
already has an owned buffer with enough capacity, which should in theory
be pretty fast, is still rather slow.
Futhermore, the ownership model of a string can change during its
lifetime. This includes things like a dependent string ending up with a
refcounted buffer or whatnot (since nothing stops that from happening).
As a result, the destructor of the shared string class is also rather
complicated, and hence also out-of-line. But even when inlined it's
pretty complicated.
The upshot of all that in the bindings cases I was dealing with is that
this code:
{ nsDependentString foo(chars, length); }
ends up with out-of-line constructor and destructor, and the destructor
has to do a nontrivial amount of work. In a perfect world, the
constructor of a dependent string should just come down to storing the
chars and length and its destructor should be empty, I would think.
I think it's possible to get to that point with the following:
1) A shared base class for all strings which does not allow mutation.
This would just store the chars and length and maybe a flags field (to
indicate things like "I'm null-terminated" as needed); its constructor
would just set those fields and its destructor would be empty. This
type would be used when you want to pass an immutable string argument,
and any string could be passed to a function taking such an argument.
2) Subclasses that have different ownership models and allow mutations
that make sense for their particular ownership models. The mutations
could have much faster, and inlined, ownership/capacity checks. These
would be used as return values for functions.
There are, of course, some drawbacks. Specifically, any function
writing to a string needs to effectively declare in its signature what
the ownership model of that string will be. And the different mutable
strings end up with API duplication if they want to expose similar APIs.
Though note that the "similar API" thing only applies to
owned/stack/refcounted strings; dependent strings do not in fact want an
Append(), though they may want a Rebind()... Also, it's not clear to me
what the best way is of handling "copying" a string in this setup; it's
possible that the flags on the base class should in fact indicate the
ownership model at least to the extent that a shareable buffer can be
shared as needed. In practice, that's how nsString behaves in
Gecko-land: shares the incoming buffer if possible, else allocates a new
shareable buffer and copies.
This is somewhat similar to the "mutable string that you can write to,
then mark as immutable" approach but enforced at compile time by the
type system: if you're a specific string type you're basically a
stringbuilder, and if you're the shared base class you're immutable.
-Boris
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo