Re: [R] Assoociative array?

rkevinburton Sat, 12 Jul 2008 13:35:13 -0700

I think there is a problem with my file or with 'read.csv'.

As you said, a[1,] returns the first row


a[1,]
  DayOfYear Quantity              Fraction  Category SubCategory
1         1       82 0.0000390392720794458 (Unknown)   (Unknown)

a[2,] returns the second row

 a[2,]
  DayOfYear Quantity              Fraction  Category SubCategory
2         2       78 0.0000371349173438631 (Unknown)   (Unknown)

This seems to continue up to row 348 after which I get something like:

But when I issue the command for what I would suspect to be the 365th row:

I get:

a[365,]
    DayOfYear Quantity              Fraction
365        82        4 0.0000019043547355827
                                                                          
Category
365 7 Plates   (Dessert),Western\n120,5,0.0000023804434194784,7 Plates   
(Dessert)
    SubCategory
365     Western

If I brin up WinEdt and look at this transition:

355,1,0.0000004760886838956,(Unknown),(Unknown)
362,15,0.0000071413302584352,(Unknown),(Unknown)
363,1,0.0000004760886838956,(Unknown),(Unknown)
1,2,0.0000009521773677913,7" Plates   (Dessert),Elmo Loves You/Hooray For Elmo
7,3,0.0000014282660516870,7" Plates   (Dessert),Elmo Loves You/Hooray For Elmo
18,8,0.0000038087094711654,7" Plates   (Dessert),Elmo Loves You/Hooray For Elmo

Could the " character cause read.csv to get confused?

Thank you.

Kevin
---- [EMAIL PROTECTED] wrote: 
> I am sorry but if read.csv returns a dataframe and a dataframe is like a 
> matrix and I have a set of input like below and a[1,] gives me the first row, 
> what is the second index? From what I read and your input I am guessing that 
> it is the column number. So a[1,1] would return the DayOfYear column for the 
> first row, right? What does a$DayOfYear return?
> 
> Thank you for your patience.
> 
> Kevin
> 
> ---- Duncan Murdoch <[EMAIL PROTECTED]> wrote: 
> > On 12/07/2008 12:31 PM, [EMAIL PROTECTED] wrote:
> > > I am using a simple R statement to read in the file:
> > > 
> > > a <- read.csv("Sample.dat", header=TRUE)
> > > 
> > > There is alot of data but the first few lines look like:
> > > 
> > > DayOfYear,Quantity,Fraction,Category,SubCategory
> > > 1,82,0.0000390392720794458,(Unknown),(Unknown)
> > > 2,78,0.0000371349173438631,(Unknown),(Unknown)
> > > . . .
> > > 71,2,0.0000009521773677913,WOMEN,Piratesses
> > > 72,4,0.0000019043547355827,WOMEN,Piratesses
> > > 73,3,0.0000014282660516870,WOMEN,Piratesses
> > > 74,14,0.0000066652415745395,WOMEN,Piratesses
> > > 75,2,0.0000009521773677913,WOMEN,Piratesses
> > > 
> > > If I read the data in as above, the command
> > > 
> > > a[1]
> > > 
> > > results in the output 
> > > 
> > > [ reached getOption("max.print") -- omitted 16193 rows ]]
> > > 
> > > Shouldn't this be the first row?
> > 
> > No, the first row would be a[1,].  read.csv() returns a dataframe, and 
> > those are indexed with two indices to treat them like a matrix, or with 
> > one index to treat them like a list of their columns.
> > 
> > Duncan Murdoch
> > 
> > > 
> > > a$Category[1]
> > > 
> > > results in the output
> > > 
> > > [1] (Unknown)
> > > 4464 Levels:   Tags ... WOMEN
> > > 
> > > But
> > > 
> > > a$Category[365]
> > > 
> > > gives me:
> > > 
> > > [1] 7 Plates   (Dessert),Western\n120,5,0.0000023804434194784,7 Plates   
> > > (Dessert)
> > > 4464 Levels:   Tags ... WOMEN
> > > 
> > > There is something fundamental about either vectors of the read.csv 
> > > command that I am missing here.
> > > 
> > > Thank you.
> > > 
> > > Kevin
> > > 
> > > ---- jim holtman <[EMAIL PROTECTED]> wrote: 
> > >> Please provide commented, minimal, self-contained, reproducible code,
> > >> or at least a before/after of what you data would look like.  Taking a
> > >> guess at what you are asking, here is one way of doing it:
> > >>
> > >>
> > >>> x <- data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=1:20, b=runif(20))
> > >>> x
> > >>    cat  a          b
> > >> 1    B  1 0.65472393
> > >> 2    C  2 0.35319727
> > >> 3    B  3 0.27026015
> > >> 4    A  4 0.99268406
> > >> 5    C  5 0.63349326
> > >> 6    A  6 0.21320814
> > >> 7    C  7 0.12937235
> > >> 8    A  8 0.47811803
> > >> 9    A  9 0.92407447
> > >> 10   A 10 0.59876097
> > >> 11   A 11 0.97617069
> > >> 12   A 12 0.73179251
> > >> 13   B 13 0.35672691
> > >> 14   C 14 0.43147369
> > >> 15   C 15 0.14821156
> > >> 16   C 16 0.01307758
> > >> 17   B 17 0.71556607
> > >> 18   B 18 0.10318424
> > >> 19   C 19 0.44628435
> > >> 20   B 20 0.64010105
> > >>> # create a list of the indices of the data grouped by 'cat'
> > >>> split(seq(nrow(x)), x$cat)
> > >> $A
> > >> [1]  4  6  8  9 10 11 12
> > >>
> > >> $B
> > >> [1]  1  3 13 17 18 20
> > >>
> > >> $C
> > >> [1]  2  5  7 14 15 16 19
> > >>
> > >>> # or do you want the data
> > >>> split(x, x$cat)
> > >> $A
> > >>    cat  a         b
> > >> 4    A  4 0.9926841
> > >> 6    A  6 0.2132081
> > >> 8    A  8 0.4781180
> > >> 9    A  9 0.9240745
> > >> 10   A 10 0.5987610
> > >> 11   A 11 0.9761707
> > >> 12   A 12 0.7317925
> > >>
> > >> $B
> > >>    cat  a         b
> > >> 1    B  1 0.6547239
> > >> 3    B  3 0.2702601
> > >> 13   B 13 0.3567269
> > >> 17   B 17 0.7155661
> > >> 18   B 18 0.1031842
> > >> 20   B 20 0.6401010
> > >>
> > >> $C
> > >>    cat  a          b
> > >> 2    C  2 0.35319727
> > >> 5    C  5 0.63349326
> > >> 7    C  7 0.12937235
> > >> 14   C 14 0.43147369
> > >> 15   C 15 0.14821156
> > >> 16   C 16 0.01307758
> > >> 19   C 19 0.44628435
> > >>
> > >>
> > >> On Sat, Jul 12, 2008 at 3:32 AM,  <[EMAIL PROTECTED]> wrote:
> > >>> I have search the archive and I could not find what I need so I will 
> > >>> try to ask the question here.
> > >>>
> > >>> I read a table in (read.table)
> > >>>
> > >>> a <- read.table(.....)
> > >>>
> > >>> The table has column names like DayOfYear, Quantity, and Category.
> > >>>
> > >>> The values in the row for Category are strings (characters).
> > >>>
> > >>> I want to get all of the rows grouped by Category. The number of unique 
> > >>> category names could be around 50. Say for argument sake the number of 
> > >>> categories is exactly 50. Can I somehow get a vector of length 50 
> > >>> containing the rows corresponding to the category (another vector)? I 
> > >>> realize I can access any row a[i]$Category (right?). But I wanta vector 
> > >>> containing the rows corresponding to each distinct Category name.
> > >>>
> > >>> Thank you.
> > >>>
> > >>> Kevin
> > >>>
> > >>> ______________________________________________
> > >>> [email protected] mailing list
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide 
> > >>> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>>
> > >>
> > >>
> > >> -- 
> > >> Jim Holtman
> > >> Cincinnati, OH
> > >> +1 513 646 9390
> > >>
> > >> What is the problem you are trying to solve?
> > > 
> > > ______________________________________________
> > > [email protected] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assoociative array?

Reply via email to