Here is a further simplification. We use the colClasses= argument with "NULL" for the columns we do not want so we do not have to later remove those columns.
# record type ("1" or "2") rectype <- substr(input, 7, 7) # read in record type "1" input1 <- input[rectype == "1"] DF1 <- read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1), col.names = c("familyid", "", "", "", "dwelling"), colClasses = c("numeric", "NULL", "NULL", "NULL", "numeric")) # read in record type "2" input2 <- input[rectype == "2"] DF2 <- read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1), col.names = c("personalid", "", "", "age", "", "sex"), colClasses = c("numeric", "NULL", "NULL", "numeric", "NULL", "numeric")) # ix is the index in DF1 of family row corresponding to each personal row in DF2 ix <- cumsum(rectype == "1")[rectype == "2"] DF <- cbind(DF1[ix,], DF2) DF On Sun, Feb 7, 2010 at 6:30 PM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > Try this. It uses input defined in Jim's post and defines the rectype > of each row ("1" or "2"). It then reads the rectype "1" records into > DF1 using read.fwf and the rectype "2" records into DF2 also using > read.fwf. ix is defined to have one component per personal record > giving the row number in DF1 of the corresponding family. We combine > DF1 and DF2 using ix and remove the column names that start with "X". > > # record type ("1" or "2") > rectype <- substr(input, 7, 7) > > # read in record type "1" > input1 <- input[rectype == "1"] > DF1 <- read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1), > col.names = c("familyid", "X", "X", "X", "dwelling")) > > # read in record type "2" > input2 <- input[rectype == "2"] > DF2 <- read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1), > col.names = c("personalid", "X", "X", "age", "X", "sex")) > > # ix is the index in DF1 of family row corresponding to each personal row in > DF2 > ix <- cumsum(rectype == "1")[rectype == "2"] > DF <- cbind(DF1[ix,], DF2) > DF <- DF[substr(names(DF), 1, 1) != "X"] > > so DF looks like this: > >> DF > familyid dwelling personalid age sex > 1 6470 1 1 32 0 > 1.1 6470 1 2 30 1 > 2 7470 0 1 40 1 > 3 8470 0 1 27 0 > 4 9470 0 1 13 1 > 4.1 9470 0 2 22 0 > 4.2 9470 0 3 24 1 > 5 10470 1 1 20 0 > 5.1 10470 1 2 11 1 > 6 11470 0 1 17 0 > 6.1 11470 0 2 10 1 > 6.2 11470 0 3 26 1 > > On Sun, Feb 7, 2010 at 10:57 AM, Saba(Home) <saba...@charter.net> wrote: >> >> I would like to read the following hierarchical data set. There is a family >> record followed by one or more personal records. >> If col. 7 is "1" it is a family record. If it is "2" it is a personal >> record. >> The family record is formatted as follows: >> col. 1-5 family id >> col. 7 "1" >> col. 9 dwelling type code >> The personal record is formatted as follows: >> col. 1-5 personal id >> col. 7 "2" >> col. 8-9 age >> col. 11 sex code >> >> The first six family and accompanying personal records look like this: >> 06470 1 1 >> 1 232 0 >> 2 230 1 >> 07470 1 0 >> 1 240 1 >> 08470 1 0 >> 1 227 0 >> 09470 1 0 >> 1 213 1 >> 2 222 0 >> 3 224 1 >> 10470 1 1 >> 1 220 0 >> 2 211 1 >> 11470 1 0 >> 1 217 0 >> 2 210 1 >> 3 226 1 >> >> I want to create a dataset containing >> . family ID >> . dwelling code >> . person ID >> . age >> . sex code >> The dataset will contain one observation per person, and the with family >> information repeated for people in the same family. >> Can anyone help? >> Thanks, >> Richard Saba >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.