On 6/5/2013 8:42 PM, Patrick Walton wrote:
Topics covered: Interning, mutability, cost of creating string objects, encoding UTF-8 versus UTF-16.

https://github.com/mozilla/servo/wiki/Strings
I would love to have been invited to this meeting. Was it announced anywhere?

I absolutely agree that we shouldn't have a separate atom type. I've actually been hoping that we could replace nsIAtom in gecko with a string flag (INTERNED) which would shortcut fast comparisons, but initial patches I had to test that were rotted by the work to have atoms store both a UTF8 and UTF16 buffer, and I never revisited it.

  * Gecko has mutable strings and this is bad for performance

I'd like to understand/challenge this statement. Is this bad for performance because you have to check the extra SHARED flag on write? With auto-sharing forcing immutability, I have trouble believing this is a big deal in practice. The noticeable problem with auto-sharing right now is that it requires threadsafe refcounting, which *does* show up in benchmarks, but that would continue to be a problem with immutable strings, if they needed to be thread-shareable. Was there discussion about whether these strings would be at all threadsafe (and the interning table)?

My primary concern with string builders is that they typically reallocate when you convert the builder to an immutable string. If we can avoid that case by reassigning the buffer, then I think most of my objections go away.


  
<https://github.com/mozilla/servo/wiki/Strings#cost-of-creating-string-objects>Cost
  of creating string objects

  * Constructors and especially destructors are expensive
  * No static typing
  * JS string comes in, want to create a Gecko DependentString,
    constructor was expensive because it had to check whether it was a
    DependentString
  * Would be nice to avoid hacks like that
  * 3 cases that Gecko has: ref counted versus owned versus dependent
    string versus null-terminated versus stack buffer
  * Stay as simple as possible, don't add new string types unless
    they're really necessary!

I love the sentiments here, and I share our frustration with complicated systems. But pretty much all of our string hacks, including JS dependent strings and XPCOM dependent/literal strings exist because they solved very noticeable performance problems. JS ropes were added in bug 571549, for example, which is definitely not ancient history. It's worth exploring whether we can remove that need by simplifying the string classes, but I'm very wary of generic advice to "stay as simple as possible" when we have prior history which indicates that simple doesn't perform well.

Was there discussion about whether string buffers should be refcounted or GCed (or copied, but I'm pretty sure that would cause memory explosion)?

--BDS

_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to