On 6/5/2013 8:42 PM, Patrick Walton wrote:
Topics covered: Interning, mutability, cost of creating string
objects, encoding UTF-8 versus UTF-16.
https://github.com/mozilla/servo/wiki/Strings
I would love to have been invited to this meeting. Was it announced
anywhere?
I absolutely agree that we shouldn't have a separate atom type. I've
actually been hoping that we could replace nsIAtom in gecko with a
string flag (INTERNED) which would shortcut fast comparisons, but
initial patches I had to test that were rotted by the work to have atoms
store both a UTF8 and UTF16 buffer, and I never revisited it.
* Gecko has mutable strings and this is bad for performance
I'd like to understand/challenge this statement. Is this bad for
performance because you have to check the extra SHARED flag on write?
With auto-sharing forcing immutability, I have trouble believing this is
a big deal in practice. The noticeable problem with auto-sharing right
now is that it requires threadsafe refcounting, which *does* show up in
benchmarks, but that would continue to be a problem with immutable
strings, if they needed to be thread-shareable. Was there discussion
about whether these strings would be at all threadsafe (and the
interning table)?
My primary concern with string builders is that they typically
reallocate when you convert the builder to an immutable string. If we
can avoid that case by reassigning the buffer, then I think most of my
objections go away.
<https://github.com/mozilla/servo/wiki/Strings#cost-of-creating-string-objects>Cost
of creating string objects
* Constructors and especially destructors are expensive
* No static typing
* JS string comes in, want to create a Gecko DependentString,
constructor was expensive because it had to check whether it was a
DependentString
* Would be nice to avoid hacks like that
* 3 cases that Gecko has: ref counted versus owned versus dependent
string versus null-terminated versus stack buffer
* Stay as simple as possible, don't add new string types unless
they're really necessary!
I love the sentiments here, and I share our frustration with complicated
systems. But pretty much all of our string hacks, including JS dependent
strings and XPCOM dependent/literal strings exist because they solved
very noticeable performance problems. JS ropes were added in bug 571549,
for example, which is definitely not ancient history. It's worth
exploring whether we can remove that need by simplifying the string
classes, but I'm very wary of generic advice to "stay as simple as
possible" when we have prior history which indicates that simple doesn't
perform well.
Was there discussion about whether string buffers should be refcounted
or GCed (or copied, but I'm pretty sure that would cause memory explosion)?
--BDS
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo