Re: [Rd] locales and readLines

2007-09-03 Thread Martin Morgan
Thank you very much for explaining this. I had indeed overlooked the use of encoding in 'file'. I also appreciate how unsatisfactory guessing at the encoding can be, and that scanning the entire file is not appropriate for large files or general connections. Sorry that 'burden' came across as nega

Re: [Rd] locales and readLines

2007-09-03 Thread Prof Brian Ripley
I think you need to delimit a bit more what you want to do. It is difficult in general to tell what encoding a text file is in, and very much harder if this is a data file containing only a small proportion of non-ASCII text, which might not even be words in a human language (but abbreviations

[Rd] locales and readLines

2007-08-31 Thread Martin Morgan
R-developers, I'm looking for some 'best practices', or perhaps an upstream solution (I have a deja vu about this, so sorry if it's already been asked). Problems occur when a file is encoded as latin1, but the user has a UTF-8 locale (or I guess more generally when the input locale does not match