On Fri 30 Apr 2021 at 19:17:55 (-0400), Stefan Monnier wrote: > > Now I wonder how this might enable random access to the nth > > character. I will keep looking around. > > Another part of the question is: why would someone give you the position > information in terms of characters rather than in terms of (say) bytes, > or words, or ...
I thought we'd scotched bytes, because the number of bytes in a character depends on the encoding: the simple letter A occupies 1 byte in ASCII and UTF-8, 2 bytes in UTF-16, and 4 in UTF-32. > or words, Counting words raises the problem of compound words, because people spell them differently: as separate words with spaces, or hyphenated, or as a single word. In English, compound words tend to evolve in that order, as we grow comfortable with seeing the words joined together. So the same sentence might have fewer words in it just because it was written in more modern times. > or ... It's fairly usual to count characters, ignoring whitespace and punctuation. There are still complications for languages like Welsh, where there are single letters like ch and ll. I remember, as a small child on holiday, learning to spell, pronounce and translate the 58-letter place name Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch and not realising for years that it was only 51 letters long, and didn't really contain a 4-letter repeat. BTW I, too, naturally googled A-O F, and found the same telescopic reference as rhkramer, though looking once again, I see this thread is starting to add far more hits for the term (which I won't repeat here). Cheers, David.