This may do what you want: > x <- read.table("/tempxx.txt", comment="", quote="", sep="|", header=TRUE, > as.is=TRUE) > # split out by name > z <- lapply(seq(nrow(x)), function(.row){ + .result <- NULL + # construct the data output + for (i in c('picnic', 'food', 'other')){ + .split <- strsplit(x[.row,][[i]], ";#") + .result <- rbind(.result, cbind(name=x[.row,][['name']], field=i, value=unlist(.split))) + } + .result + }) > > > z [[1]] name field value [1,] "Yogi Bear" "picnic" "Yes" [2,] "Yogi Bear" "food" "Hamburgers" [3,] "Yogi Bear" "food" "Hot Dogs" [4,] "Yogi Bear" "food" "I rely on others to bring the good stuff" [5,] "Yogi Bear" "other" "\"Softball" [6,] "Yogi Bear" "other" "Blanket" [7,] "Yogi Bear" "other" "I bring boo-boo, but he hides\""
[[2]] name field value [1,] "Boo-Boo" "picnic" "Yes" [2,] "Boo-Boo" "food" "Potato Salad" [3,] "Boo-Boo" "food" "Cole Slaw" [4,] "Boo-Boo" "food" "whatever Yogi doesn't eat" [5,] "Boo-Boo" "other" "Lawn Chairs" [6,] "Boo-Boo" "other" "Blanket" [7,] "Boo-Boo" "other" "my running shoes" [[3]] name field value [1,] "Ranger Rick" "picnic" "No" [2,] "Ranger Rick" "food" "I told you I don't picnic" [3,] "Ranger Rick" "other" "a big net and handcuffs" [[4]] name field value [1,] "Magilla Gorilla" "picnic" "Yes" [2,] "Magilla Gorilla" "food" "Hamburgers" [3,] "Magilla Gorilla" "food" "Hot Dogs" [4,] "Magilla Gorilla" "food" "Potato Salad" [5,] "Magilla Gorilla" "food" "Cole Slaw" [6,] "Magilla Gorilla" "food" "BBQ Chicken" [7,] "Magilla Gorilla" "other" "Softball" [8,] "Magilla Gorilla" "other" "Volleyball" [9,] "Magilla Gorilla" "other" "Lawn Chairs" [10,] "Magilla Gorilla" "other" "Blanket" On Sun, Jul 13, 2008 at 12:56 AM, Hohm, Dale <[EMAIL PROTECTED]> wrote: > Thanks for the reply Jim. > > Here is a representation of the data I want to analyze - 10 records as > requested. Each line can easily include an ID number as below. > > So I want to determine a frequency or percentage of respondents that bring > each of the 5 foods (Hamburgers, Hot Dogs, Potato Salad, Cole Slaw and BBQ > Chicken) and how many "Other" write-ins there are. The same for what else is > brought besides food (Softball, Volleyball, Lawn Chairs and Blanket) as well > as a count of "Other" write-ins. I'll also need to be able to discern how > many brought Hambergers AND a Blanket or how many brought a Softball AND a > Vollyball etc. > > ID|Your Name|Do you picnic?|What is your favorite picnic food?|What do you > bring besides food? > 1|Yogi Bear|Yes|Hamburgers;#Hot Dogs;#I rely on others to bring the good > stuff|"Softball;#Blanket;#I bring boo-boo, but he hides" > 2|Boo-Boo|Yes|Potato Salad;#Cole Slaw;#whatever Yogi doesn't eat|Lawn > Chairs;#Blanket;#my running shoes > 3|Ranger Rick|No|I told you I don't picnic|a big net and handcuffs > 4|Magilla Gorilla|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ > Chicken|Softball;#Volleyball;#Lawn Chairs;#Blanket > 5|Foghorn Leghorn|Yes|"Hot Dogs;#Cole Slaw;#I say, I say, BBQ > Chicken?"|Softball;#Blanket > 6|Peter Potamus|Yes|"Hamburgers;#Hot Dogs;#anything, just a lot of > it"|Softball;#Lawn Chairs;#hot air balloon > 7|Jonny Quest|No|too busy getting into and out of trouble|Hadji and Bandit > 8|"Fleegle, Bingo, Drooper and Snorky"|Yes|Hamburgers;#Hot Dogs;#Potato > Salad;#Cole Slaw;#A banana split|a laugh track > 9|George Jetson|No|Mr. Spacely is making me work|Lawn Chairs;#Blanket;#my > flying car > 10|Snagglepuss|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ > Chicken|Softball;#Heavens to Murgatroyd! Exit stage left! > > Thanks in advance, > > Dale > > -----Original Message----- > From: jim holtman [mailto:[EMAIL PROTECTED] > Sent: Saturday, July 12, 2008 11:32 AM > To: Hohm, Dale > Cc: r-help@r-project.org > Subject: Re: [R] Reading Multi-value data fields for descriptive analysis > > Can you provide a more complete example (say 10 lines) of what the > input is like. Does each line have a unique index that can be related > to it? Do you want to summarize all the multi1-n values of Col2? Do > you want to know the percentage of input lines that have a > Col3/multi-value4 on them? You could read in the data as you have > indicated below and add a column that is the record number and > therefore you would have have to worry about trying to say if it > existed or not. For example, you might have: > > Rec#|col#|value > 1|1|single > 1|2|multi1 > 1|2|multi2 > 1|3|multi1 > 2|1|single > 3|1|single > 3|2|multi1 > .... > > There are a number of potential ways of representing the data, but a > lot depends on what you want to do with it, so a more extensive > example of the input, along with the type of output you would like > will help in providing an answer. > > On Sat, Jul 12, 2008 at 12:37 PM, Hohm, Dale <[EMAIL PROTECTED]> wrote: >> Hello, >> >> I'm looking for help on the best approach to get "multi-value" data fields >> into R for simple descriptive analysis. >> >> ------------------------------------- >> >> I am new to this list and new to R, but I really want to get over the hump >> and get productive with it. Some help with how to best get the following >> data into R would be greatly appreciated. I have programming experience and >> stale experience with SPSS. > >> >> I am trying to do some simple descriptive analysis (frequencies, cross-tabs) >> of data stored in a Microsoft SharePoint list. The data can be accessed >> with ODBC or it can readily be extracted into an Excel or CSV format. One >> of the challenges with the data is that it uses several "multi-value" fields >> (Microsoft Access provides the same data-type). >> >> By "multi-value" I mean that multiple responses are packed into a single >> data column; the data input form presents a question with several checkboxes >> and a free-format write-in response. The individual values within the data >> field are separated with the two characters ";#". So, the data would be of >> the following format (in CSV form with column headers and a tilde as the >> field separator): >> >> Column1single~Column2multi~Column3multi >> a sample value~C2 a multi one;#C2 a multi two~C3 a multi one;#C3 a multi >> two;#C3 a free-form answer >> >> >> The first approach that comes to mind is to explode the multi-value fields >> into unique bi-variate data columns and then assign a 0 or 1 to these new >> columns in each record based on whether that specific value was present. >> This approach is complicated by the free-form answer as the unique columns >> could grow very large in number - it might be better to figure out how to >> indicate the presence of the free-form value in a data column called "Other" >> (or "C2 Other") and then hold the free-form value in a separate column. >> >> The data would then look like this... >> >> Column1single: a sample value >> C2 a multi one: 1 >> C2 a multi two: 1 >> C2 a multi three: 0 >> C3 a multi one: 1 >> C3 a multi two: 1 >> C3 a free-form answer: 1 >> C3 another free-form answer: 0 >> >> >> Or in the second scenario... >> >> Column1single: a sample value >> C2 a multi one: 1 >> C2 a multi two: 1 >> C2 a multi three: 0 >> C3 a multi one: 1 >> C3 a multi two: 1 >> C3 Other: 1 >> C3 Other Text: a free-form answer >> >> >> I am uncertain help to read this data into R in this format, so suggestions >> and examples would help me greatly. >> >> This is a pretty common data packing scenario, so perhaps there are better >> approaches to reading this data and better ways in R to analyze it than what >> I have presented. Suggestions greatly appreciated. >> >> >> Thanks, >> >> Dale Hohm >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.