Hi Gabor,

I am coming back to you about the method you described to me a month ago 
to define the level order during a read.table call. I initially thought 
that I would need to apply the 'unique' function on a single column of 
my dataset, so I only used it after the read.table step (to make my life 
easier)... Well, I was wrong: I need to reorder all my columns (just to 
remind you, I don't know the numbers of columns my code has to handle). 
So, here come troubles.

I first tried to apply your code as is, although I thought there might 
be some problems. The class can actually not be recycled, when a list 
notation is used (the help says that "colClasses character. A *vector* 
of classes to be assumed for the columns. Recycled as necessary..."). 
See the following example:

######################

library(methods)                                                                
                 

setClass("my.factor")                                                           
                 

setAs("character", 
"my.factor",                                                                  

 function(from) factor(from, levels = 
unique(from)))                                            
                                                                                
                 

Input<-"a b c 
d                                                                               
   

1 1 175 n 
f                                                                               
       

2 2 102 n 
j                                                                               
       

3 3 187 o 
n                                                                               
       

4 4 106 u 
g                                                                               
       

5 5 102 o 
v                                                                               
       

6 6 133 l 
x                                                                               
       

7 7 149 w 
q                                                                               
       

8 8 122 x 
p                                                                               
       

9 9 151 u 
r                                                                               
       

10 10 134 e 
g                                                                               
     

11 11 170 j 
q                                                                               
     

12 12 103 v 
n                                                                               
     

13 13 153 n 
w                                                                               
     

14 14 106 x 
x                                                                               
     

15 15 185 v 
x                                                                               
     

16 16 102 s 
p                                                                               
     

17 17 181 i 
h                                                                               
     

18 18 192 o 
k                                                                               
     

19 19 161 d 
f                                                                               
     

20 20 158 n 
q                                                                               
     

"                                                                               
                 

                                                                                
                 

DF <- read.table(textConnection(Input), header = TRUE, colClasses = 
list(c=("my.factor")))      
levels(DF$c)         # properly ordered 
                                                                                
  

levels(DF$d)         # not reordered

######################

I also tried that:

######################

DF <- read.table(textConnection(Input), header = TRUE, colClasses = 
c("my.factor"))      
levels(DF$c)                                                                    
                 

levels(DF$d)

######################

In this case, the class is definitely recycled as all the columns of DF 
are transformed into factors... Not really useful :)
I tried to modify the content of the list or my second notation, by 
including "integer" or a second "my.factor"... but I did not have much 
success.
Any idea how to use the class "my.factor" multiple times ?

Thanks in advance

Gabor Grothendieck a écrit :
> Its the same principle.  Just change the function to be suitable.  This one
> arranges the levels according to the input:
>
> library(methods)
> setClass("my.factor")
> setAs("character", "my.factor",
>  function(from) factor(from, levels = unique(from)))
>
> Input <- "a b c
> 1   1 176 w
> 2   2 141 k
> 3   3 172 r
> 4   4 182 s
> 5   5 123 k
> 6   6 153 p
> 7   7 176 l
> 8   8 170 u
> 9   9 140 z
> 10 10 194 s
> 11 11 164 j
> 12 12 100 j
> 13 13 127 x
> 14 14 137 r
> 15 15 198 d
> 16 16 173 j
> 17 17 113 x
> 18 18 144 w
> 19 19 198 q
> 20 20 122 f
> "
> DF <- read.table(textConnection(Input), header = TRUE,
>   colClasses = list(c = "my.factor"))
> str(DF)
>
>
> On 8/28/07, Sébastien <[EMAIL PROTECTED]> wrote:
>   
>> Ok, I cannot send to you one of my dataset since they are confidential. But
>> I can produce a dummy "mini" dataset to illustrate my question. Let's say I
>> have a csv file with 3 columns and 20 rows which content is reproduced by
>> the following line.
>>
>>     
>>> mydata<-data.frame(a=1:20,
>>>       
>> b=sample(100:200,20,replace=T),c=sample(letters[1:26], 20,
>> replace = T))
>>     
>>> mydata
>>>       
>>     a   b c
>> 1   1 176 w
>> 2   2 141 k
>> 3   3 172 r
>> 4   4 182 s
>> 5   5 123 k
>> 6   6 153 p
>> 7   7 176 l
>> 8   8 170 u
>> 9   9 140 z
>> 10 10 194 s
>> 11 11 164 j
>> 12 12 100 j
>> 13 13 127 x
>> 14 14 137 r
>> 15 15 198 d
>> 16 16 173 j
>> 17 17 113 x
>> 18 18 144 w
>> 19 19 198 q
>> 20 20 122 f
>>
>> If I had to read the csv file, I would use something like:
>> mydata<-data.frame(read.table(file="c:/test.csv",header=T))
>>
>> Now, if you look at mydata$c, the levels are alphabetically ordered.
>>     
>>> mydata$c
>>>       
>>  [1] w k r s k p l u z s j j x r d j x w q f
>> Levels: d f j k l p q r s u w x z
>>
>> What I am trying to do is to reorder the levels as to have them in the order
>> they appear in the table, ie
>> Levels: w k r s p l u z j x d q f
>>
>> Again, keep in mind that my script should be used on datasets which content
>> are unknown to me. In my example, I have used letters for mydata$c, but my
>> code may have to handle factors of numeric or character values (I need to
>> transform specific columns of my dataset into factors for plotting
>> purposes). My goal is to let the code scan the content of each factor of my
>> data.frame during or after the read.table step and reorder their levels
>> automatically without having to ask the user to hard-code the level order.
>>
>> In a way, my problem is more related to the way the factor levels are
>> ordered than to the read.table function, although I guess there is a link...
>>
>> Gabor Grothendieck a écrit :
>> Its not clear from your description what you want.
>>     
> Could you be a bit more
>   
>> specific including an example.
>>     
>
> On 8/28/07, Sébastien <[EMAIL PROTECTED]>
>   
>> wrote:
>>     
>
>   
>> Thanks Gabor, I have two questions:
>>     
>
> 1- Is there any difference between your
>   
>> code and the following one, with
>>     
> regards to Fld2 ?
> ### test ###
>
>   
>> Input <- "Fld1 Fld2
>>     
> 10 A
> 20 B
> 30 C
> 40 A
> "
> DF <-
>
>   
>> read.table(textConnection(Input), header =
>>     
> TRUE)
>
>   
>> DF$Fld2<-factor(DF$Fld2,levels= c("C", "A", "B")))
>>     
>
>   
>> 2- do you see any way to bring flexibility to your method ? Because,
>> it
>>     
> looks to me as, at this stage, I have to i) know the order of my
>   
>> levels
>>     
> before I read the table and ii) create one class per factor.
> My
>   
>> problem is that I am not really working on a specific dataset. My goal is
>>     
> to
>   
>> develop R scripts capable of handling datasets which have various
>>     
> contents
>   
>> but close structures. So, I really need to minimize the quantity
>> of
>>     
> "user-specific" code.
>
> Sebastien
>
> Gabor Grothendieck a écrit :
> You can
>   
>> create your own class and pass that to read table. In
>>     
>
>   
>> the example
>>     
>
>   
>> below Fld2 is read in with factor levels C, A, B
>>     
>
>   
>> in that
>>     
>
>   
>> order.
>>     
>
>   
> library(methods)
> setClass("my.levels")
> setAs("character",
>
>   
>> "my.levels",
>>     
>
>   
>>  function(from) factor(from, levels = c("C", "A", "B")))
>>     
>
>
> ###
>
>   
>> test ###
>>     
>
>   
>> Input <- "Fld1 Fld2
>>     
> 10 A
> 20 B
> 30 C
> 40 A
> "
> DF <-
>
>   
>> read.table(textConnection(Input), header = TRUE,
>>     
>
>   
>>  colClasses = c("numeric",
>>     
>
>   
>> "my.levels"))
>>     
>
>   
>> str(DF)
>>     
> # or
> DF <- read.table(textConnection(Input), header =
>
>   
>> TRUE,
>>     
>
>   
>>  colClasses = list(Fld2 = "my.levels"))
>>     
> str(DF)
>
>
> On 8/28/07,
>
>   
>> Sébastien <[EMAIL PROTECTED]> wrote:
>>     
>
>   
>> Dear R-users,
>>     
>
>   
>> I have found this not-so-recent post in the archives
>>     
>
>   
>> -
>>     
>
>   
>> http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html -
>>     
>
>   
>> while I was
>>     
>
>   
>> looking for a particular way to reorder factor levels. The
>>     
>
>   
>> question
>>     
>
>   
>> addressed by the author was to know if the read.table function
>>     
>
>   
>> could be
>>     
>
>   
>> modified to order the levels of newly created factors "according to
>>     
>
>   
>> the
>>     
>
>   
>> order that they appear in the data file". Exactly what I am looking
>>     
>
>   
>> for.
>>     
>
>   
>> As there was no reply to this post, I wonder if any move have been
>>     
>
>   
>> made
>>     
>
>   
>> towards the implementation of this suggestion. A quick look
>>     
>
>   
>> at
>>     
>
>   
>> ?read.table tells me that if this option was implemented, it was not
>>     
>
>   
>> in
>>     
>
>   
>> the read.table function...
>>     
>
> Sebastien
>
> PS: I am sorry to post so many
>
>   
>> messages on the list, but I am learning R
>>     
>
>   
>> (basically by trials & errors ;-)
>>     
>
>   
>> ) and no one around me has even a
>>     
>
>   
>> slight notion about
>>     
>
>   
>> it...
>>     
>
>   
>> ______________________________________________
>>     
> [EMAIL PROTECTED]
>   
>> mailing
>>     
> list
>
>   
>> https://stat.ethz.ch/mailman/listinfo/r-help
>>     
> PLEASE do
>
>   
>> read the posting
>> guide
>>     
> http://www.R-project.org/posting-guide.html
>
>   
>> and provide
>>     
>
>   
>> commented, minimal, self-contained, reproducible code.
>>     
>
>   
>
>   
>>     
>
>   
>
>
>   

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to