I am sorry but if read.csv returns a dataframe and a dataframe is like a matrix
and I have a set of input like below and a[1,] gives me the first row, what is
the second index? From what I read and your input I am guessing that it is the
column number. So a[1,1] would return the DayOfYear column for the first row,
right? What does a$DayOfYear return?
Thank you for your patience.
Kevin
---- Duncan Murdoch <[EMAIL PROTECTED]> wrote:
> On 12/07/2008 12:31 PM, [EMAIL PROTECTED] wrote:
> > I am using a simple R statement to read in the file:
> >
> > a <- read.csv("Sample.dat", header=TRUE)
> >
> > There is alot of data but the first few lines look like:
> >
> > DayOfYear,Quantity,Fraction,Category,SubCategory
> > 1,82,0.0000390392720794458,(Unknown),(Unknown)
> > 2,78,0.0000371349173438631,(Unknown),(Unknown)
> > . . .
> > 71,2,0.0000009521773677913,WOMEN,Piratesses
> > 72,4,0.0000019043547355827,WOMEN,Piratesses
> > 73,3,0.0000014282660516870,WOMEN,Piratesses
> > 74,14,0.0000066652415745395,WOMEN,Piratesses
> > 75,2,0.0000009521773677913,WOMEN,Piratesses
> >
> > If I read the data in as above, the command
> >
> > a[1]
> >
> > results in the output
> >
> > [ reached getOption("max.print") -- omitted 16193 rows ]]
> >
> > Shouldn't this be the first row?
>
> No, the first row would be a[1,]. read.csv() returns a dataframe, and
> those are indexed with two indices to treat them like a matrix, or with
> one index to treat them like a list of their columns.
>
> Duncan Murdoch
>
> >
> > a$Category[1]
> >
> > results in the output
> >
> > [1] (Unknown)
> > 4464 Levels: Tags ... WOMEN
> >
> > But
> >
> > a$Category[365]
> >
> > gives me:
> >
> > [1] 7 Plates (Dessert),Western\n120,5,0.0000023804434194784,7 Plates
> > (Dessert)
> > 4464 Levels: Tags ... WOMEN
> >
> > There is something fundamental about either vectors of the read.csv command
> > that I am missing here.
> >
> > Thank you.
> >
> > Kevin
> >
> > ---- jim holtman <[EMAIL PROTECTED]> wrote:
> >> Please provide commented, minimal, self-contained, reproducible code,
> >> or at least a before/after of what you data would look like. Taking a
> >> guess at what you are asking, here is one way of doing it:
> >>
> >>
> >>> x <- data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=1:20, b=runif(20))
> >>> x
> >> cat a b
> >> 1 B 1 0.65472393
> >> 2 C 2 0.35319727
> >> 3 B 3 0.27026015
> >> 4 A 4 0.99268406
> >> 5 C 5 0.63349326
> >> 6 A 6 0.21320814
> >> 7 C 7 0.12937235
> >> 8 A 8 0.47811803
> >> 9 A 9 0.92407447
> >> 10 A 10 0.59876097
> >> 11 A 11 0.97617069
> >> 12 A 12 0.73179251
> >> 13 B 13 0.35672691
> >> 14 C 14 0.43147369
> >> 15 C 15 0.14821156
> >> 16 C 16 0.01307758
> >> 17 B 17 0.71556607
> >> 18 B 18 0.10318424
> >> 19 C 19 0.44628435
> >> 20 B 20 0.64010105
> >>> # create a list of the indices of the data grouped by 'cat'
> >>> split(seq(nrow(x)), x$cat)
> >> $A
> >> [1] 4 6 8 9 10 11 12
> >>
> >> $B
> >> [1] 1 3 13 17 18 20
> >>
> >> $C
> >> [1] 2 5 7 14 15 16 19
> >>
> >>> # or do you want the data
> >>> split(x, x$cat)
> >> $A
> >> cat a b
> >> 4 A 4 0.9926841
> >> 6 A 6 0.2132081
> >> 8 A 8 0.4781180
> >> 9 A 9 0.9240745
> >> 10 A 10 0.5987610
> >> 11 A 11 0.9761707
> >> 12 A 12 0.7317925
> >>
> >> $B
> >> cat a b
> >> 1 B 1 0.6547239
> >> 3 B 3 0.2702601
> >> 13 B 13 0.3567269
> >> 17 B 17 0.7155661
> >> 18 B 18 0.1031842
> >> 20 B 20 0.6401010
> >>
> >> $C
> >> cat a b
> >> 2 C 2 0.35319727
> >> 5 C 5 0.63349326
> >> 7 C 7 0.12937235
> >> 14 C 14 0.43147369
> >> 15 C 15 0.14821156
> >> 16 C 16 0.01307758
> >> 19 C 19 0.44628435
> >>
> >>
> >> On Sat, Jul 12, 2008 at 3:32 AM, <[EMAIL PROTECTED]> wrote:
> >>> I have search the archive and I could not find what I need so I will try
> >>> to ask the question here.
> >>>
> >>> I read a table in (read.table)
> >>>
> >>> a <- read.table(.....)
> >>>
> >>> The table has column names like DayOfYear, Quantity, and Category.
> >>>
> >>> The values in the row for Category are strings (characters).
> >>>
> >>> I want to get all of the rows grouped by Category. The number of unique
> >>> category names could be around 50. Say for argument sake the number of
> >>> categories is exactly 50. Can I somehow get a vector of length 50
> >>> containing the rows corresponding to the category (another vector)? I
> >>> realize I can access any row a[i]$Category (right?). But I wanta vector
> >>> containing the rows corresponding to each distinct Category name.
> >>>
> >>> Thank you.
> >>>
> >>> Kevin
> >>>
> >>> ______________________________________________
> >>> [email protected] mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >>
> >> --
> >> Jim Holtman
> >> Cincinnati, OH
> >> +1 513 646 9390
> >>
> >> What is the problem you are trying to solve?
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.