On 9/2/13 13:36, Joshua Cranmer 🐧 wrote:
I don't think there *is* a sane approach that satisfies everybody. Either you break "UTF8-just-works-everywhere", you break legacy content, you make parsing take inordinate times...
I want to push on this last point a bit. Using a straightforward UTF-8 detection algorithm (which could probably stand some optimization), it takes my laptop somewhere between 0.9 ms and 1.4 ms to scan a _Megabyte_ buffer in order to check whether it consists entirely of valid UTF-8 sequences (the speed variation depends on what proportion of the characters in the buffer are higher than U+007F). That hardly even rises to the level of noise.
-- Adam Roach Principal Platform Engineer a...@mozilla.com +1 650 903 0800 x863 _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform