Re: Detection of unlabeled UTF-8

Adam Roach Wed, 04 Sep 2013 14:42:33 -0700

On 9/2/13 13:36, Joshua Cranmer 🐧 wrote:

I don't think there *is* a sane approach that satisfies everybody.Either you break "UTF8-just-works-everywhere", you break legacycontent, you make parsing take inordinate times...

I want to push on this last point a bit. Using a straightforward UTF-8detection algorithm (which could probably stand some optimization), ittakes my laptop somewhere between 0.9 ms and 1.4 ms to scan a _Megabyte_buffer in order to check whether it consists entirely of valid UTF-8sequences (the speed variation depends on what proportion of thecharacters in the buffer are higher than U+007F). That hardly even risesto the level of noise.



--
Adam Roach
Principal Platform Engineer
a...@mozilla.com
+1 650 903 0800 x863
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Detection of unlabeled UTF-8

Reply via email to