Re: [R] Assoociative array?

rkevinburton Sat, 12 Jul 2008 22:30:29 -0700

This is almost it. Maybe it is as good as can be expected. The only problem 
that I see is that this seems to form a Category/SubCategory pair where none 
existed in the original data. For example, A might have two sub-categories a 
and b, and B might have two categories c and d. As far as I can tell the method 
that you outlined forms a Category/SubCategory pair like B a or B b where none 
existed. This results in alot of empty lists and it seems to take a long time 
to generate. But if that is as good as it gets then I can live with it.


I know that I said one more question. But I have run into a problem. c <- 
split(x, x$Category) returns a vector of the rows in each of the categories. 
Now I would like to access the "Quantity" column within this split vector. I 
can see it listed. I just can't access it. I have tried c[1]$Quantity and 
c[1,2] both which give me errors. Any ideas? 

Sorry this is so hard for me. I am more used to C type arrays and C type arrays 
of structures. This seems to be somewhat different.

Thank you.

Kevin
---- jim holtman <[EMAIL PROTECTED]> wrote: 
> Is this something like what you were asking for?  The output of a
> 'split' will be a list of the dataframe subsets for the categories you
> have specified.
> 
> > x <- data.frame(g1=sample(LETTERS[1:2],30,TRUE),
> +     g2=sample(letters[1:2], 30, TRUE),
> +     g3=1:30)
> > y <- split(x, list(x$g1, x$g2))
> > str(y)
> List of 4
>  $ A.a:'data.frame':    7 obs. of  3 variables:
>   ..$ g1: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1
>   ..$ g2: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1
>   ..$ g3: int [1:7] 3 4 6 8 9 13 24
>  $ B.a:'data.frame':    7 obs. of  3 variables:
>   ..$ g1: Factor w/ 2 levels "A","B": 2 2 2 2 2 2 2
>   ..$ g2: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1
>   ..$ g3: int [1:7] 10 11 16 17 18 20 25
>  $ A.b:'data.frame':    6 obs. of  3 variables:
>   ..$ g1: Factor w/ 2 levels "A","B": 1 1 1 1 1 1
>   ..$ g2: Factor w/ 2 levels "a","b": 2 2 2 2 2 2
>   ..$ g3: int [1:6] 2 12 23 26 27 29
>  $ B.b:'data.frame':    10 obs. of  3 variables:
>   ..$ g1: Factor w/ 2 levels "A","B": 2 2 2 2 2 2 2 2 2 2
>   ..$ g2: Factor w/ 2 levels "a","b": 2 2 2 2 2 2 2 2 2 2
>   ..$ g3: int [1:10] 1 5 7 14 15 19 21 22 28 30
> > y
> $A.a
>    g1 g2 g3
> 3   A  a  3
> 4   A  a  4
> 6   A  a  6
> 8   A  a  8
> 9   A  a  9
> 13  A  a 13
> 24  A  a 24
> 
> $B.a
>    g1 g2 g3
> 10  B  a 10
> 11  B  a 11
> 16  B  a 16
> 17  B  a 17
> 18  B  a 18
> 20  B  a 20
> 25  B  a 25
> 
> $A.b
>    g1 g2 g3
> 2   A  b  2
> 12  A  b 12
> 23  A  b 23
> 26  A  b 26
> 27  A  b 27
> 29  A  b 29
> 
> $B.b
>    g1 g2 g3
> 1   B  b  1
> 5   B  b  5
> 7   B  b  7
> 14  B  b 14
> 15  B  b 15
> 19  B  b 19
> 21  B  b 21
> 22  B  b 22
> 28  B  b 28
> 30  B  b 30
> 
> > y[[2]]
>    g1 g2 g3
> 10  B  a 10
> 11  B  a 11
> 16  B  a 16
> 17  B  a 17
> 18  B  a 18
> 20  B  a 20
> 25  B  a 25
> >
> >
> >
> 
> 
> On Sat, Jul 12, 2008 at 8:51 PM,  <[EMAIL PROTECTED]> wrote:
> > OK. Now I know that I am dealing with a data frame. One last question on 
> > this topic. a <- read.csv() gives me a dataframe. If I have 'c <- split(x, 
> > x$Category), then what is  returned by split in this case? c[1] seems to be 
> > OK but c[2] is not right in my mind. If I run ci <- split(nrow(a), 
> > a$Category). And then ci[1] seems to be the rows associated with the first 
> > category, c[2] is the indices/rows associated with the second category, 
> > etc. But this seems different than c[1], c[2], etc.
> >
> > Using the techniques below I can get the information on the categories. Now 
> > as an extra level of complexity there are SubCategories within each 
> > Category. Assume that the SubCategory names are not unique within the 
> > dataset so if I want the SubCategory data I need to retrive the indices (or 
> > data) for the Category and SubCategory pair. In other words if I have a 
> > Category that ranges from 'A' to 'Z', it is possible that I might have a 
> > subcategory A a, A b (where a and b are the sub category names). I also 
> > might have B a, B b. I want all of the sub categories A a. NOT the 
> > subcategories a (because that might include B a which would be different). 
> > I am guessing that this will take more than a simple 'split'.
> >
> > Thank you.
> >
> > Kevin
> >
> > ---- Duncan Murdoch <[EMAIL PROTECTED]> wrote:
> >> On 12/07/2008 3:59 PM, [EMAIL PROTECTED] wrote:
> >> > I am sorry but if read.csv returns a dataframe and a dataframe is like a 
> >> > matrix and I have a set of input like below and a[1,] gives me the first 
> >> > row, what is the second index? From what I read and your input I am 
> >> > guessing that it is the column number. So a[1,1] would return the 
> >> > DayOfYear column for the first row, right? What does a$DayOfYear return?
> >>
> >> a$DayOfYear would be the same as a[,1] or a[,"DayOfYear"], i.e. it would
> >> return the entire first column.
> >>
> >> Duncan Murdoch
> >>
> >> >
> >> > Thank you for your patience.
> >> >
> >> > Kevin
> >> >
> >> > ---- Duncan Murdoch <[EMAIL PROTECTED]> wrote:
> >> >> On 12/07/2008 12:31 PM, [EMAIL PROTECTED] wrote:
> >> >>> I am using a simple R statement to read in the file:
> >> >>>
> >> >>> a <- read.csv("Sample.dat", header=TRUE)
> >> >>>
> >> >>> There is alot of data but the first few lines look like:
> >> >>>
> >> >>> DayOfYear,Quantity,Fraction,Category,SubCategory
> >> >>> 1,82,0.0000390392720794458,(Unknown),(Unknown)
> >> >>> 2,78,0.0000371349173438631,(Unknown),(Unknown)
> >> >>> . . .
> >> >>> 71,2,0.0000009521773677913,WOMEN,Piratesses
> >> >>> 72,4,0.0000019043547355827,WOMEN,Piratesses
> >> >>> 73,3,0.0000014282660516870,WOMEN,Piratesses
> >> >>> 74,14,0.0000066652415745395,WOMEN,Piratesses
> >> >>> 75,2,0.0000009521773677913,WOMEN,Piratesses
> >> >>>
> >> >>> If I read the data in as above, the command
> >> >>>
> >> >>> a[1]
> >> >>>
> >> >>> results in the output
> >> >>>
> >> >>> [ reached getOption("max.print") -- omitted 16193 rows ]]
> >> >>>
> >> >>> Shouldn't this be the first row?
> >> >> No, the first row would be a[1,].  read.csv() returns a dataframe, and
> >> >> those are indexed with two indices to treat them like a matrix, or with
> >> >> one index to treat them like a list of their columns.
> >> >>
> >> >> Duncan Murdoch
> >> >>
> >> >>> a$Category[1]
> >> >>>
> >> >>> results in the output
> >> >>>
> >> >>> [1] (Unknown)
> >> >>> 4464 Levels:   Tags ... WOMEN
> >> >>>
> >> >>> But
> >> >>>
> >> >>> a$Category[365]
> >> >>>
> >> >>> gives me:
> >> >>>
> >> >>> [1] 7 Plates   (Dessert),Western\n120,5,0.0000023804434194784,7 Plates 
> >> >>>   (Dessert)
> >> >>> 4464 Levels:   Tags ... WOMEN
> >> >>>
> >> >>> There is something fundamental about either vectors of the read.csv 
> >> >>> command that I am missing here.
> >> >>>
> >> >>> Thank you.
> >> >>>
> >> >>> Kevin
> >> >>>
> >> >>> ---- jim holtman <[EMAIL PROTECTED]> wrote:
> >> >>>> Please provide commented, minimal, self-contained, reproducible code,
> >> >>>> or at least a before/after of what you data would look like.  Taking a
> >> >>>> guess at what you are asking, here is one way of doing it:
> >> >>>>
> >> >>>>
> >> >>>>> x <- data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=1:20, b=runif(20))
> >> >>>>> x
> >> >>>>    cat  a          b
> >> >>>> 1    B  1 0.65472393
> >> >>>> 2    C  2 0.35319727
> >> >>>> 3    B  3 0.27026015
> >> >>>> 4    A  4 0.99268406
> >> >>>> 5    C  5 0.63349326
> >> >>>> 6    A  6 0.21320814
> >> >>>> 7    C  7 0.12937235
> >> >>>> 8    A  8 0.47811803
> >> >>>> 9    A  9 0.92407447
> >> >>>> 10   A 10 0.59876097
> >> >>>> 11   A 11 0.97617069
> >> >>>> 12   A 12 0.73179251
> >> >>>> 13   B 13 0.35672691
> >> >>>> 14   C 14 0.43147369
> >> >>>> 15   C 15 0.14821156
> >> >>>> 16   C 16 0.01307758
> >> >>>> 17   B 17 0.71556607
> >> >>>> 18   B 18 0.10318424
> >> >>>> 19   C 19 0.44628435
> >> >>>> 20   B 20 0.64010105
> >> >>>>> # create a list of the indices of the data grouped by 'cat'
> >> >>>>> split(seq(nrow(x)), x$cat)
> >> >>>> $A
> >> >>>> [1]  4  6  8  9 10 11 12
> >> >>>>
> >> >>>> $B
> >> >>>> [1]  1  3 13 17 18 20
> >> >>>>
> >> >>>> $C
> >> >>>> [1]  2  5  7 14 15 16 19
> >> >>>>
> >> >>>>> # or do you want the data
> >> >>>>> split(x, x$cat)
> >> >>>> $A
> >> >>>>    cat  a         b
> >> >>>> 4    A  4 0.9926841
> >> >>>> 6    A  6 0.2132081
> >> >>>> 8    A  8 0.4781180
> >> >>>> 9    A  9 0.9240745
> >> >>>> 10   A 10 0.5987610
> >> >>>> 11   A 11 0.9761707
> >> >>>> 12   A 12 0.7317925
> >> >>>>
> >> >>>> $B
> >> >>>>    cat  a         b
> >> >>>> 1    B  1 0.6547239
> >> >>>> 3    B  3 0.2702601
> >> >>>> 13   B 13 0.3567269
> >> >>>> 17   B 17 0.7155661
> >> >>>> 18   B 18 0.1031842
> >> >>>> 20   B 20 0.6401010
> >> >>>>
> >> >>>> $C
> >> >>>>    cat  a          b
> >> >>>> 2    C  2 0.35319727
> >> >>>> 5    C  5 0.63349326
> >> >>>> 7    C  7 0.12937235
> >> >>>> 14   C 14 0.43147369
> >> >>>> 15   C 15 0.14821156
> >> >>>> 16   C 16 0.01307758
> >> >>>> 19   C 19 0.44628435
> >> >>>>
> >> >>>>
> >> >>>> On Sat, Jul 12, 2008 at 3:32 AM,  <[EMAIL PROTECTED]> wrote:
> >> >>>>> I have search the archive and I could not find what I need so I will 
> >> >>>>> try to ask the question here.
> >> >>>>>
> >> >>>>> I read a table in (read.table)
> >> >>>>>
> >> >>>>> a <- read.table(.....)
> >> >>>>>
> >> >>>>> The table has column names like DayOfYear, Quantity, and Category.
> >> >>>>>
> >> >>>>> The values in the row for Category are strings (characters).
> >> >>>>>
> >> >>>>> I want to get all of the rows grouped by Category. The number of 
> >> >>>>> unique category names could be around 50. Say for argument sake the 
> >> >>>>> number of categories is exactly 50. Can I somehow get a vector of 
> >> >>>>> length 50 containing the rows corresponding to the category (another 
> >> >>>>> vector)? I realize I can access any row a[i]$Category (right?). But 
> >> >>>>> I wanta vector containing the rows corresponding to each distinct 
> >> >>>>> Category name.
> >> >>>>>
> >> >>>>> Thank you.
> >> >>>>>
> >> >>>>> Kevin
> >> >>>>>
> >> >>>>> ______________________________________________
> >> >>>>> [email protected] mailing list
> >> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >>>>> PLEASE do read the posting guide 
> >> >>>>> http://www.R-project.org/posting-guide.html
> >> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >> >>>>>
> >> >>>>
> >> >>>> --
> >> >>>> Jim Holtman
> >> >>>> Cincinnati, OH
> >> >>>> +1 513 646 9390
> >> >>>>
> >> >>>> What is the problem you are trying to solve?
> >> >>> ______________________________________________
> >> >>> [email protected] mailing list
> >> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >>> PLEASE do read the posting guide 
> >> >>> http://www.R-project.org/posting-guide.html
> >> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem you are trying to solve?

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assoociative array?

Reply via email to