[dev-servo] Table-less string interning

2014-04-23 Thread Keegan McAllister
I opened this as a GitHub issue (https://github.com/mozilla/servo/issues/2217) but I thought I'd mention it here as well. The traditional approach for string interning is: * Interned strings are pointers * To intern a string, look in a hash table mapping strings to pointers * If it's not found,

Re: [dev-servo] Table-less string interning

2014-04-23 Thread Robert O'Callahan
It seems like it would be a good idea to instrument Gecko to gather a trace of intern operations from regular Web browsing that you can run against test programs. Making the interned atom for <= 8-char or 4-char ASCII strings be the string itself is a really interesting idea. Another idea I like

Re: [dev-servo] Table-less string interning

2014-04-23 Thread Keegan McAllister
> Another idea I like is to introduce a separate immutable prepopulated (with > tag names, attribute values, CSS properties and keywords, known-to-be popular > values, etc) table that uses the same hash values as the mutable concurrent > table (so we don't have to compute the hash value twice)

Re: [dev-servo] Table-less string interning

2014-04-23 Thread Brian Anderson
On 04/23/2014 04:38 PM, Keegan McAllister wrote: I opened this as a GitHub issue (https://github.com/mozilla/servo/issues/2217) but I thought I'd mention it here as well. Rust already uses SipHash, which is supposed to be fast and crypto-strength. But we would need to run it twice to get 128

Re: [dev-servo] Table-less string interning

2014-04-23 Thread Boris Zbarsky
On 4/23/14, 7:38 PM, Keegan McAllister wrote: * How much does a 128-bit compare cost, vs. 64 bits for a pointer? Are we more or less assuming we're on 64-bit systems? My memory is that the 128-bit IID compares we end up doing in QueryInterface sorta suck, but we do a bunch of them at a time,

Re: [dev-servo] Table-less string interning

2014-04-23 Thread Boris Zbarsky
On 4/23/14, 8:53 PM, Brian Anderson wrote: We're assuming nobody is going to be attacking Rust symbols though Oh, right, that's the other worry. We've had security issues in the past due to things like type="fİle"> being treated as a file input by some parts of the system but not others. So

Re: [dev-servo] Table-less string interning

2014-04-23 Thread Patrick Walton
On 4/23/14 6:06 PM, Boris Zbarsky wrote: On 4/23/14, 8:53 PM, Brian Anderson wrote: We're assuming nobody is going to be attacking Rust symbols though Oh, right, that's the other worry. We've had security issues in the past due to things like being treated as a file input by some parts of th