If you don't know ahead of time how many columns you have and only that they are a mix of numeric and character (to be converted to factor) then you can do this:
DF <- read.table(textConnection(Input), header = TRUE, as.is = TRUE) f <- function(x) if (is.character(x)) factor(x, levels = unique(x)) else x DF[] <- lapply(DF, f) DF On 9/19/07, Sébastien <[EMAIL PROTECTED]> wrote: > Hi Gabor, > > I am coming back to you about the method you described to me a month ago to > define the level order during a read.table call. I initially thought that I > would need to apply the 'unique' function on a single column of my dataset, > so I only used it after the read.table step (to make my life easier)... > Well, I was wrong: I need to reorder all my columns (just to remind you, I > don't know the numbers of columns my code has to handle). So, here come > troubles. > > I first tried to apply your code as is, although I thought there might be > some problems. The class can actually not be recycled, when a list notation > is used (the help says that "colClasses character. A vector of classes to be > assumed for the columns. Recycled as necessary..."). See the following > example: > > ###################### > > library(methods) > > setClass("my.factor") > > setAs("character", "my.factor", > > function(from) factor(from, levels = unique(from))) > > > > Input<-"a b c d > > 1 1 175 n f > > 2 2 102 n j > > 3 3 187 o n > > 4 4 106 u g > > 5 5 102 o v > > 6 6 133 l x > > 7 7 149 w q > > 8 8 122 x p > > 9 9 151 u r > > 10 10 134 e g > > 11 11 170 j q > > 12 12 103 v n > > 13 13 153 n w > > 14 14 106 x x > > 15 15 185 v x > > 16 16 102 s p > > 17 17 181 i h > > 18 18 192 o k > > 19 19 161 d f > > 20 20 158 n q > > " > > > > DF <- read.table(textConnection(Input), header = TRUE, colClasses = > list(c=("my.factor"))) > levels(DF$c) # properly ordered > > > levels(DF$d) # not reordered > > ###################### > > I also tried that: > > ###################### > > DF <- read.table(textConnection(Input), header = TRUE, colClasses = > c("my.factor")) > levels(DF$c) > > levels(DF$d) > > ###################### > > In this case, the class is definitely recycled as all the columns of DF are > transformed into factors... Not really useful :) > I tried to modify the content of the list or my second notation, by > including "integer" or a second "my.factor"... but I did not have much > success. > Any idea how to use the class "my.factor" multiple times ? > > Thanks in advance > > > Gabor Grothendieck a écrit : > Its the same principle. Just change the function to be suitable. This > one arranges the levels according to the > input: library(methods) setClass("my.factor") setAs("character", > "my.factor", function(from) factor(from, levels = unique(from))) Input <- > "a b c 1 1 176 w 2 2 141 k 3 3 172 r 4 4 182 s 5 5 123 k 6 6 153 p 7 7 176 > l 8 8 170 u 9 9 140 z 10 10 194 s 11 11 164 j 12 12 100 j 13 13 127 x 14 14 > 137 r 15 15 198 d 16 16 173 j 17 17 113 x 18 18 144 w 19 19 198 q 20 20 122 > f " DF <- read.table(textConnection(Input), header = TRUE, colClasses = > list(c = "my.factor")) str(DF) On 8/28/07, Sébastien <[EMAIL PROTECTED]> > wrote: > Ok, I cannot send to you one of my dataset since they are confidential. > But I can produce a dummy "mini" dataset to illustrate my question. Let's > say I have a csv file with 3 columns and 20 rows which content is reproduced > by the following line. > mydata<-data.frame(a=1:20, > b=sample(100:200,20,replace=T),c=sample(letters[1:26], > 20, replace = T)) > mydata > a b c 1 1 176 w 2 2 141 k 3 3 172 r 4 4 182 s 5 5 123 k 6 6 153 p 7 7 176 > l 8 8 170 u 9 9 140 z 10 10 194 s 11 11 164 j 12 12 100 j 13 13 127 x 14 14 > 137 r 15 15 198 d 16 16 173 j 17 17 113 x 18 18 144 w 19 19 198 q 20 20 122 > f If I had to read the csv file, I would use something > like: mydata<-data.frame(read.table(file="c:/test.csv",header=T)) Now, if > you look at mydata$c, the levels are alphabetically ordered. > mydata$c > [1] w k r s k p l u z s j j x r d j x w q f Levels: d f j k l p q r s u w x > z What I am trying to do is to reorder the levels as to have them in the > order they appear in the table, ie Levels: w k r s p l u z j x d q f Again, > keep in mind that my script should be used on datasets which content are > unknown to me. In my example, I have used letters for mydata$c, but my code > may have to handle factors of numeric or character values (I need > to transform specific columns of my dataset into factors for > plotting purposes). My goal is to let the code scan the content of each > factor of my data.frame during or after the read.table step and reorder > their levels automatically without having to ask the user to hard-code the > level order. In a way, my problem is more related to the way the factor > levels are ordered than to the read.table function, although I guess there > is a link... Gabor Grothendieck a écrit : Its not clear from your > description what you want. > Could you be a bit more > specific including an example. > On 8/28/07, Sébastien <[EMAIL PROTECTED]> > wrote: > > Thanks Gabor, I have two questions: > 1- Is there any difference between your > code and the following one, with > regards to Fld2 ? ### test ### > Input <- "Fld1 Fld2 > 10 A 20 B 30 C 40 A " DF <- > read.table(textConnection(Input), header = > TRUE) > DF$Fld2<-factor(DF$Fld2,levels= c("C", "A", "B"))) > > 2- do you see any way to bring flexibility to your method ? Because, it > looks to me as, at this stage, I have to i) know the order of my > levels > before I read the table and ii) create one class per factor. My > problem is that I am not really working on a specific dataset. My goal is > to > develop R scripts capable of handling datasets which have various > contents > but close structures. So, I really need to minimize the quantity of > "user-specific" code. Sebastien Gabor Grothendieck a écrit : You can > create your own class and pass that to read table. In > > the example > > below Fld2 is read in with factor levels C, A, B > > in that > > order. > > library(methods) setClass("my.levels") setAs("character", > "my.levels", > > function(from) factor(from, levels = c("C", "A", "B"))) > ### > test ### > > Input <- "Fld1 Fld2 > 10 A 20 B 30 C 40 A " DF <- > read.table(textConnection(Input), header = TRUE, > > colClasses = c("numeric", > > "my.levels")) > > str(DF) > # or DF <- read.table(textConnection(Input), header = > TRUE, > > colClasses = list(Fld2 = "my.levels")) > str(DF) On 8/28/07, > Sébastien <[EMAIL PROTECTED]> wrote: > > Dear R-users, > > I have found this not-so-recent post in the archives > > - > > http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html - > > while I was > > looking for a particular way to reorder factor levels. The > > question > > addressed by the author was to know if the read.table function > > could be > > modified to order the levels of newly created factors "according to > > the > > order that they appear in the data file". Exactly what I am looking > > for. > > As there was no reply to this post, I wonder if any move have been > > made > > towards the implementation of this suggestion. A quick look > > at > > ?read.table tells me that if this option was implemented, it was not > > in > > the read.table function... > Sebastien PS: I am sorry to post so many > messages on the list, but I am learning R > > (basically by trials & errors ;-) > > ) and no one around me has even a > > slight notion about > > it... > > ______________________________________________ > [EMAIL PROTECTED] > mailing > list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do > read the posting guide > http://www.R-project.org/posting-guide.html > and provide > > commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.