That is not a well-defined concept. To define 'character' you need to know the encoding, since that determines how to split the bytes into characters. So only whole strings can be UTF-8 or not. You can say which bytes in a stream of bytes would be valid in UTF-8, but if not all of them are then almost certainly it would be incorrect to interpret any of them in UTF-8.
You can find out if a stream of bytes is valid in a UTF-8 locale by nchar(x, "c", allowNA=TRUE) and testing for NA elements in the result. On Fri, 26 Oct 2007, Bos, Roger wrote: > All, > > I am trying to post text from an XLS spread to my wiki and I need to > remove any characters that are not UTF-8. Is there an easy gsub command > that can do this? > > (I previously sent this same email to r-sig-gui. That was a mistake and > I apologize for the duplication.) > > Thanks, Roger J. Bos -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.