Hi Gabor, I am coming back to you about the method you described to me a month ago to define the level order during a read.table call. I initially thought that I would need to apply the 'unique' function on a single column of my dataset, so I only used it after the read.table step (to make my life easier)... Well, I was wrong: I need to reorder all my columns (just to remind you, I don't know the numbers of columns my code has to handle). So, here come troubles.
I first tried to apply your code as is, although I thought there might be some problems. The class can actually not be recycled, when a list notation is used (the help says that "colClasses character. A *vector* of classes to be assumed for the columns. Recycled as necessary..."). See the following example: ###################### library(methods) setClass("my.factor") setAs("character", "my.factor", function(from) factor(from, levels = unique(from))) Input<-"a b c d 1 1 175 n f 2 2 102 n j 3 3 187 o n 4 4 106 u g 5 5 102 o v 6 6 133 l x 7 7 149 w q 8 8 122 x p 9 9 151 u r 10 10 134 e g 11 11 170 j q 12 12 103 v n 13 13 153 n w 14 14 106 x x 15 15 185 v x 16 16 102 s p 17 17 181 i h 18 18 192 o k 19 19 161 d f 20 20 158 n q " DF <- read.table(textConnection(Input), header = TRUE, colClasses = list(c=("my.factor"))) levels(DF$c) # properly ordered levels(DF$d) # not reordered ###################### I also tried that: ###################### DF <- read.table(textConnection(Input), header = TRUE, colClasses = c("my.factor")) levels(DF$c) levels(DF$d) ###################### In this case, the class is definitely recycled as all the columns of DF are transformed into factors... Not really useful :) I tried to modify the content of the list or my second notation, by including "integer" or a second "my.factor"... but I did not have much success. Any idea how to use the class "my.factor" multiple times ? Thanks in advance Gabor Grothendieck a écrit : > Its the same principle. Just change the function to be suitable. This one > arranges the levels according to the input: > > library(methods) > setClass("my.factor") > setAs("character", "my.factor", > function(from) factor(from, levels = unique(from))) > > Input <- "a b c > 1 1 176 w > 2 2 141 k > 3 3 172 r > 4 4 182 s > 5 5 123 k > 6 6 153 p > 7 7 176 l > 8 8 170 u > 9 9 140 z > 10 10 194 s > 11 11 164 j > 12 12 100 j > 13 13 127 x > 14 14 137 r > 15 15 198 d > 16 16 173 j > 17 17 113 x > 18 18 144 w > 19 19 198 q > 20 20 122 f > " > DF <- read.table(textConnection(Input), header = TRUE, > colClasses = list(c = "my.factor")) > str(DF) > > > On 8/28/07, Sébastien <[EMAIL PROTECTED]> wrote: > >> Ok, I cannot send to you one of my dataset since they are confidential. But >> I can produce a dummy "mini" dataset to illustrate my question. Let's say I >> have a csv file with 3 columns and 20 rows which content is reproduced by >> the following line. >> >> >>> mydata<-data.frame(a=1:20, >>> >> b=sample(100:200,20,replace=T),c=sample(letters[1:26], 20, >> replace = T)) >> >>> mydata >>> >> a b c >> 1 1 176 w >> 2 2 141 k >> 3 3 172 r >> 4 4 182 s >> 5 5 123 k >> 6 6 153 p >> 7 7 176 l >> 8 8 170 u >> 9 9 140 z >> 10 10 194 s >> 11 11 164 j >> 12 12 100 j >> 13 13 127 x >> 14 14 137 r >> 15 15 198 d >> 16 16 173 j >> 17 17 113 x >> 18 18 144 w >> 19 19 198 q >> 20 20 122 f >> >> If I had to read the csv file, I would use something like: >> mydata<-data.frame(read.table(file="c:/test.csv",header=T)) >> >> Now, if you look at mydata$c, the levels are alphabetically ordered. >> >>> mydata$c >>> >> [1] w k r s k p l u z s j j x r d j x w q f >> Levels: d f j k l p q r s u w x z >> >> What I am trying to do is to reorder the levels as to have them in the order >> they appear in the table, ie >> Levels: w k r s p l u z j x d q f >> >> Again, keep in mind that my script should be used on datasets which content >> are unknown to me. In my example, I have used letters for mydata$c, but my >> code may have to handle factors of numeric or character values (I need to >> transform specific columns of my dataset into factors for plotting >> purposes). My goal is to let the code scan the content of each factor of my >> data.frame during or after the read.table step and reorder their levels >> automatically without having to ask the user to hard-code the level order. >> >> In a way, my problem is more related to the way the factor levels are >> ordered than to the read.table function, although I guess there is a link... >> >> Gabor Grothendieck a écrit : >> Its not clear from your description what you want. >> > Could you be a bit more > >> specific including an example. >> > > On 8/28/07, Sébastien <[EMAIL PROTECTED]> > >> wrote: >> > > >> Thanks Gabor, I have two questions: >> > > 1- Is there any difference between your > >> code and the following one, with >> > regards to Fld2 ? > ### test ### > > >> Input <- "Fld1 Fld2 >> > 10 A > 20 B > 30 C > 40 A > " > DF <- > > >> read.table(textConnection(Input), header = >> > TRUE) > > >> DF$Fld2<-factor(DF$Fld2,levels= c("C", "A", "B"))) >> > > >> 2- do you see any way to bring flexibility to your method ? Because, >> it >> > looks to me as, at this stage, I have to i) know the order of my > >> levels >> > before I read the table and ii) create one class per factor. > My > >> problem is that I am not really working on a specific dataset. My goal is >> > to > >> develop R scripts capable of handling datasets which have various >> > contents > >> but close structures. So, I really need to minimize the quantity >> of >> > "user-specific" code. > > Sebastien > > Gabor Grothendieck a écrit : > You can > >> create your own class and pass that to read table. In >> > > >> the example >> > > >> below Fld2 is read in with factor levels C, A, B >> > > >> in that >> > > >> order. >> > > > library(methods) > setClass("my.levels") > setAs("character", > > >> "my.levels", >> > > >> function(from) factor(from, levels = c("C", "A", "B"))) >> > > > ### > > >> test ### >> > > >> Input <- "Fld1 Fld2 >> > 10 A > 20 B > 30 C > 40 A > " > DF <- > > >> read.table(textConnection(Input), header = TRUE, >> > > >> colClasses = c("numeric", >> > > >> "my.levels")) >> > > >> str(DF) >> > # or > DF <- read.table(textConnection(Input), header = > > >> TRUE, >> > > >> colClasses = list(Fld2 = "my.levels")) >> > str(DF) > > > On 8/28/07, > > >> Sébastien <[EMAIL PROTECTED]> wrote: >> > > >> Dear R-users, >> > > >> I have found this not-so-recent post in the archives >> > > >> - >> > > >> http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html - >> > > >> while I was >> > > >> looking for a particular way to reorder factor levels. The >> > > >> question >> > > >> addressed by the author was to know if the read.table function >> > > >> could be >> > > >> modified to order the levels of newly created factors "according to >> > > >> the >> > > >> order that they appear in the data file". Exactly what I am looking >> > > >> for. >> > > >> As there was no reply to this post, I wonder if any move have been >> > > >> made >> > > >> towards the implementation of this suggestion. A quick look >> > > >> at >> > > >> ?read.table tells me that if this option was implemented, it was not >> > > >> in >> > > >> the read.table function... >> > > Sebastien > > PS: I am sorry to post so many > > >> messages on the list, but I am learning R >> > > >> (basically by trials & errors ;-) >> > > >> ) and no one around me has even a >> > > >> slight notion about >> > > >> it... >> > > >> ______________________________________________ >> > [EMAIL PROTECTED] > >> mailing >> > list > > >> https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do > > >> read the posting >> guide >> > http://www.R-project.org/posting-guide.html > > >> and provide >> > > >> commented, minimal, self-contained, reproducible code. >> > > > > >> > > > > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.