Re: [R] Assoociative array?

jim holtman Sat, 12 Jul 2008 14:29:07 -0700

Yes.  If you want read.csv to ignore quotes, the have as a parameter:   quote=""


What is happening is that otherwise it assumes you have values
enclosed in quotes because they might have spaces or the separator
character in them.  You data (7" Plate) looks like a quoted string
starting after the 7.  That does appear to be what your output is also
saying.

On Sat, Jul 12, 2008 at 4:31 PM,  <[EMAIL PROTECTED]> wrote:
> I think there is a problem with my file or with 'read.csv'.
>
> As you said, a[1,] returns the first row
>
> a[1,]
>  DayOfYear Quantity              Fraction  Category SubCategory
> 1         1       82 0.0000390392720794458 (Unknown)   (Unknown)
>
> a[2,] returns the second row
>
>  a[2,]
>  DayOfYear Quantity              Fraction  Category SubCategory
> 2         2       78 0.0000371349173438631 (Unknown)   (Unknown)
>
> This seems to continue up to row 348 after which I get something like:
>
> But when I issue the command for what I would suspect to be the 365th row:
>
> I get:
>
> a[365,]
>    DayOfYear Quantity              Fraction
> 365        82        4 0.0000019043547355827
>                                                                          
> Category
> 365 7 Plates   (Dessert),Western\n120,5,0.0000023804434194784,7 Plates   
> (Dessert)
>    SubCategory
> 365     Western
>
> If I brin up WinEdt and look at this transition:
>
> 355,1,0.0000004760886838956,(Unknown),(Unknown)
> 362,15,0.0000071413302584352,(Unknown),(Unknown)
> 363,1,0.0000004760886838956,(Unknown),(Unknown)
> 1,2,0.0000009521773677913,7" Plates   (Dessert),Elmo Loves You/Hooray For Elmo
> 7,3,0.0000014282660516870,7" Plates   (Dessert),Elmo Loves You/Hooray For Elmo
> 18,8,0.0000038087094711654,7" Plates   (Dessert),Elmo Loves You/Hooray For 
> Elmo
>
> Could the " character cause read.csv to get confused?
>
> Thank you.
>
> Kevin
> ---- [EMAIL PROTECTED] wrote:
>> I am sorry but if read.csv returns a dataframe and a dataframe is like a 
>> matrix and I have a set of input like below and a[1,] gives me the first 
>> row, what is the second index? From what I read and your input I am guessing 
>> that it is the column number. So a[1,1] would return the DayOfYear column 
>> for the first row, right? What does a$DayOfYear return?
>>
>> Thank you for your patience.
>>
>> Kevin
>>
>> ---- Duncan Murdoch <[EMAIL PROTECTED]> wrote:
>> > On 12/07/2008 12:31 PM, [EMAIL PROTECTED] wrote:
>> > > I am using a simple R statement to read in the file:
>> > >
>> > > a <- read.csv("Sample.dat", header=TRUE)
>> > >
>> > > There is alot of data but the first few lines look like:
>> > >
>> > > DayOfYear,Quantity,Fraction,Category,SubCategory
>> > > 1,82,0.0000390392720794458,(Unknown),(Unknown)
>> > > 2,78,0.0000371349173438631,(Unknown),(Unknown)
>> > > . . .
>> > > 71,2,0.0000009521773677913,WOMEN,Piratesses
>> > > 72,4,0.0000019043547355827,WOMEN,Piratesses
>> > > 73,3,0.0000014282660516870,WOMEN,Piratesses
>> > > 74,14,0.0000066652415745395,WOMEN,Piratesses
>> > > 75,2,0.0000009521773677913,WOMEN,Piratesses
>> > >
>> > > If I read the data in as above, the command
>> > >
>> > > a[1]
>> > >
>> > > results in the output
>> > >
>> > > [ reached getOption("max.print") -- omitted 16193 rows ]]
>> > >
>> > > Shouldn't this be the first row?
>> >
>> > No, the first row would be a[1,].  read.csv() returns a dataframe, and
>> > those are indexed with two indices to treat them like a matrix, or with
>> > one index to treat them like a list of their columns.
>> >
>> > Duncan Murdoch
>> >
>> > >
>> > > a$Category[1]
>> > >
>> > > results in the output
>> > >
>> > > [1] (Unknown)
>> > > 4464 Levels:   Tags ... WOMEN
>> > >
>> > > But
>> > >
>> > > a$Category[365]
>> > >
>> > > gives me:
>> > >
>> > > [1] 7 Plates   (Dessert),Western\n120,5,0.0000023804434194784,7 Plates   
>> > > (Dessert)
>> > > 4464 Levels:   Tags ... WOMEN
>> > >
>> > > There is something fundamental about either vectors of the read.csv 
>> > > command that I am missing here.
>> > >
>> > > Thank you.
>> > >
>> > > Kevin
>> > >
>> > > ---- jim holtman <[EMAIL PROTECTED]> wrote:
>> > >> Please provide commented, minimal, self-contained, reproducible code,
>> > >> or at least a before/after of what you data would look like.  Taking a
>> > >> guess at what you are asking, here is one way of doing it:
>> > >>
>> > >>
>> > >>> x <- data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=1:20, b=runif(20))
>> > >>> x
>> > >>    cat  a          b
>> > >> 1    B  1 0.65472393
>> > >> 2    C  2 0.35319727
>> > >> 3    B  3 0.27026015
>> > >> 4    A  4 0.99268406
>> > >> 5    C  5 0.63349326
>> > >> 6    A  6 0.21320814
>> > >> 7    C  7 0.12937235
>> > >> 8    A  8 0.47811803
>> > >> 9    A  9 0.92407447
>> > >> 10   A 10 0.59876097
>> > >> 11   A 11 0.97617069
>> > >> 12   A 12 0.73179251
>> > >> 13   B 13 0.35672691
>> > >> 14   C 14 0.43147369
>> > >> 15   C 15 0.14821156
>> > >> 16   C 16 0.01307758
>> > >> 17   B 17 0.71556607
>> > >> 18   B 18 0.10318424
>> > >> 19   C 19 0.44628435
>> > >> 20   B 20 0.64010105
>> > >>> # create a list of the indices of the data grouped by 'cat'
>> > >>> split(seq(nrow(x)), x$cat)
>> > >> $A
>> > >> [1]  4  6  8  9 10 11 12
>> > >>
>> > >> $B
>> > >> [1]  1  3 13 17 18 20
>> > >>
>> > >> $C
>> > >> [1]  2  5  7 14 15 16 19
>> > >>
>> > >>> # or do you want the data
>> > >>> split(x, x$cat)
>> > >> $A
>> > >>    cat  a         b
>> > >> 4    A  4 0.9926841
>> > >> 6    A  6 0.2132081
>> > >> 8    A  8 0.4781180
>> > >> 9    A  9 0.9240745
>> > >> 10   A 10 0.5987610
>> > >> 11   A 11 0.9761707
>> > >> 12   A 12 0.7317925
>> > >>
>> > >> $B
>> > >>    cat  a         b
>> > >> 1    B  1 0.6547239
>> > >> 3    B  3 0.2702601
>> > >> 13   B 13 0.3567269
>> > >> 17   B 17 0.7155661
>> > >> 18   B 18 0.1031842
>> > >> 20   B 20 0.6401010
>> > >>
>> > >> $C
>> > >>    cat  a          b
>> > >> 2    C  2 0.35319727
>> > >> 5    C  5 0.63349326
>> > >> 7    C  7 0.12937235
>> > >> 14   C 14 0.43147369
>> > >> 15   C 15 0.14821156
>> > >> 16   C 16 0.01307758
>> > >> 19   C 19 0.44628435
>> > >>
>> > >>
>> > >> On Sat, Jul 12, 2008 at 3:32 AM,  <[EMAIL PROTECTED]> wrote:
>> > >>> I have search the archive and I could not find what I need so I will 
>> > >>> try to ask the question here.
>> > >>>
>> > >>> I read a table in (read.table)
>> > >>>
>> > >>> a <- read.table(.....)
>> > >>>
>> > >>> The table has column names like DayOfYear, Quantity, and Category.
>> > >>>
>> > >>> The values in the row for Category are strings (characters).
>> > >>>
>> > >>> I want to get all of the rows grouped by Category. The number of 
>> > >>> unique category names could be around 50. Say for argument sake the 
>> > >>> number of categories is exactly 50. Can I somehow get a vector of 
>> > >>> length 50 containing the rows corresponding to the category (another 
>> > >>> vector)? I realize I can access any row a[i]$Category (right?). But I 
>> > >>> wanta vector containing the rows corresponding to each distinct 
>> > >>> Category name.
>> > >>>
>> > >>> Thank you.
>> > >>>
>> > >>> Kevin
>> > >>>
>> > >>> ______________________________________________
>> > >>> R-help@r-project.org mailing list
>> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>> PLEASE do read the posting guide 
>> > >>> http://www.R-project.org/posting-guide.html
>> > >>> and provide commented, minimal, self-contained, reproducible code.
>> > >>>
>> > >>
>> > >>
>> > >> --
>> > >> Jim Holtman
>> > >> Cincinnati, OH
>> > >> +1 513 646 9390
>> > >>
>> > >> What is the problem you are trying to solve?
>> > >
>> > > ______________________________________________
>> > > R-help@r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide 
>> > > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assoociative array?

Reply via email to