I am using a simple R statement to read in the file: a <- read.csv("Sample.dat", header=TRUE)
There is alot of data but the first few lines look like: DayOfYear,Quantity,Fraction,Category,SubCategory 1,82,0.0000390392720794458,(Unknown),(Unknown) 2,78,0.0000371349173438631,(Unknown),(Unknown) . . . 71,2,0.0000009521773677913,WOMEN,Piratesses 72,4,0.0000019043547355827,WOMEN,Piratesses 73,3,0.0000014282660516870,WOMEN,Piratesses 74,14,0.0000066652415745395,WOMEN,Piratesses 75,2,0.0000009521773677913,WOMEN,Piratesses If I read the data in as above, the command a[1] results in the output [ reached getOption("max.print") -- omitted 16193 rows ]] Shouldn't this be the first row? a$Category[1] results in the output [1] (Unknown) 4464 Levels: Tags ... WOMEN But a$Category[365] gives me: [1] 7 Plates (Dessert),Western\n120,5,0.0000023804434194784,7 Plates (Dessert) 4464 Levels: Tags ... WOMEN There is something fundamental about either vectors of the read.csv command that I am missing here. Thank you. Kevin ---- jim holtman <[EMAIL PROTECTED]> wrote: > Please provide commented, minimal, self-contained, reproducible code, > or at least a before/after of what you data would look like. Taking a > guess at what you are asking, here is one way of doing it: > > > > x <- data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=1:20, b=runif(20)) > > x > cat a b > 1 B 1 0.65472393 > 2 C 2 0.35319727 > 3 B 3 0.27026015 > 4 A 4 0.99268406 > 5 C 5 0.63349326 > 6 A 6 0.21320814 > 7 C 7 0.12937235 > 8 A 8 0.47811803 > 9 A 9 0.92407447 > 10 A 10 0.59876097 > 11 A 11 0.97617069 > 12 A 12 0.73179251 > 13 B 13 0.35672691 > 14 C 14 0.43147369 > 15 C 15 0.14821156 > 16 C 16 0.01307758 > 17 B 17 0.71556607 > 18 B 18 0.10318424 > 19 C 19 0.44628435 > 20 B 20 0.64010105 > > # create a list of the indices of the data grouped by 'cat' > > split(seq(nrow(x)), x$cat) > $A > [1] 4 6 8 9 10 11 12 > > $B > [1] 1 3 13 17 18 20 > > $C > [1] 2 5 7 14 15 16 19 > > > # or do you want the data > > split(x, x$cat) > $A > cat a b > 4 A 4 0.9926841 > 6 A 6 0.2132081 > 8 A 8 0.4781180 > 9 A 9 0.9240745 > 10 A 10 0.5987610 > 11 A 11 0.9761707 > 12 A 12 0.7317925 > > $B > cat a b > 1 B 1 0.6547239 > 3 B 3 0.2702601 > 13 B 13 0.3567269 > 17 B 17 0.7155661 > 18 B 18 0.1031842 > 20 B 20 0.6401010 > > $C > cat a b > 2 C 2 0.35319727 > 5 C 5 0.63349326 > 7 C 7 0.12937235 > 14 C 14 0.43147369 > 15 C 15 0.14821156 > 16 C 16 0.01307758 > 19 C 19 0.44628435 > > > On Sat, Jul 12, 2008 at 3:32 AM, <[EMAIL PROTECTED]> wrote: > > I have search the archive and I could not find what I need so I will try to > > ask the question here. > > > > I read a table in (read.table) > > > > a <- read.table(.....) > > > > The table has column names like DayOfYear, Quantity, and Category. > > > > The values in the row for Category are strings (characters). > > > > I want to get all of the rows grouped by Category. The number of unique > > category names could be around 50. Say for argument sake the number of > > categories is exactly 50. Can I somehow get a vector of length 50 > > containing the rows corresponding to the category (another vector)? I > > realize I can access any row a[i]$Category (right?). But I wanta vector > > containing the rows corresponding to each distinct Category name. > > > > Thank you. > > > > Kevin > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.