Hello; I am new user of R; so pardon me.
I am reading a .txt file that has around 50+ numeric columns with '\t' as separator. I am using read.csv function along with colClasses but that fails to recognize double quoted numeric values. (My numeric values are something like "1,001.23"; "1,008,000.456".) Basically read.csv fails with - "scan() expected 'a real', got '"1,044.059"'. What I have tried and problems with them: 1) I tried scan and pipe but getting following error message; that is how do I replace all double quotes with nothing. I tired enclosing sed command in single quotes but that does not help. (Though the sed command works from shell) scan(pipe("sed -e s/\"//g DataAll.txt"), sep="\t") sh: Syntax error: Unterminated quoted string 2) On mailing list on solution I found was setAs() described here http://www.nabble.com/Re%3A--R--read.table()-and-scientific-notation-p6734890.html 3) Other than using as.is=TRUE and then doing as.numeric for numeric columns what is the solution? But then how do I efficiently convert 50+ columns to numeric using regular expression? That is all my numeric columns name starts with 'X' character, so how do I use sapply and/or regular expression to convert all columns starting with X to numeric? What is the alternate method to do so? Basically 2 and 3 works but which one is efficient and correct way to do this. (Also what is most efficient way to apply field level validation and conversion while reading a file? Does one has to read the file and only after that validation and conversion can happen?) Thanks for taking out time to read through the mail. Thanks and Regards -Aval ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.