Re: [R] Reading Multi-value data fields for descriptive analysis

jim holtman Sun, 13 Jul 2008 05:38:42 -0700

This may do what you want:

> x <- read.table("/tempxx.txt", comment="", quote="", sep="|", header=TRUE, 
> as.is=TRUE)
> # split out by name
> z <- lapply(seq(nrow(x)), function(.row){
+     .result <- NULL
+     # construct the data output
+     for (i in c('picnic', 'food', 'other')){
+         .split <- strsplit(x[.row,][[i]], ";#")
+         .result <- rbind(.result, cbind(name=x[.row,][['name']],
field=i, value=unlist(.split)))
+     }
+     .result
+ })
>
>
> z
[[1]]
     name        field    value
[1,] "Yogi Bear" "picnic" "Yes"
[2,] "Yogi Bear" "food"   "Hamburgers"
[3,] "Yogi Bear" "food"   "Hot Dogs"
[4,] "Yogi Bear" "food"   "I rely on others to bring the good stuff"
[5,] "Yogi Bear" "other"  "\"Softball"
[6,] "Yogi Bear" "other"  "Blanket"
[7,] "Yogi Bear" "other"  "I bring boo-boo, but he hides\""

[[2]]
     name      field    value
[1,] "Boo-Boo" "picnic" "Yes"
[2,] "Boo-Boo" "food"   "Potato Salad"
[3,] "Boo-Boo" "food"   "Cole Slaw"
[4,] "Boo-Boo" "food"   "whatever Yogi doesn't eat"
[5,] "Boo-Boo" "other"  "Lawn Chairs"
[6,] "Boo-Boo" "other"  "Blanket"
[7,] "Boo-Boo" "other"  "my running shoes"

[[3]]
     name          field    value
[1,] "Ranger Rick" "picnic" "No"
[2,] "Ranger Rick" "food"   "I told you I don't picnic"
[3,] "Ranger Rick" "other"  "a big net and handcuffs"

[[4]]
      name              field    value
 [1,] "Magilla Gorilla" "picnic" "Yes"
 [2,] "Magilla Gorilla" "food"   "Hamburgers"
 [3,] "Magilla Gorilla" "food"   "Hot Dogs"
 [4,] "Magilla Gorilla" "food"   "Potato Salad"
 [5,] "Magilla Gorilla" "food"   "Cole Slaw"
 [6,] "Magilla Gorilla" "food"   "BBQ Chicken"
 [7,] "Magilla Gorilla" "other"  "Softball"
 [8,] "Magilla Gorilla" "other"  "Volleyball"
 [9,] "Magilla Gorilla" "other"  "Lawn Chairs"
[10,] "Magilla Gorilla" "other"  "Blanket"



On Sun, Jul 13, 2008 at 12:56 AM, Hohm, Dale <[EMAIL PROTECTED]> wrote:
> Thanks for the reply Jim.
>
> Here is a representation of the data I want to analyze - 10 records as 
> requested.  Each line can easily include an ID number as below.
>
> So I want to determine a frequency or percentage of respondents that bring 
> each of the 5 foods (Hamburgers, Hot Dogs, Potato Salad, Cole Slaw and BBQ 
> Chicken) and how many "Other" write-ins there are.  The same for what else is 
> brought besides food (Softball, Volleyball, Lawn Chairs and Blanket) as well 
> as a count of "Other" write-ins.  I'll also need to be able to discern how 
> many brought Hambergers AND a Blanket or how many brought a Softball AND a 
> Vollyball etc.
>
> ID|Your Name|Do you picnic?|What is your favorite picnic food?|What do you 
> bring besides food?
> 1|Yogi Bear|Yes|Hamburgers;#Hot Dogs;#I rely on others to bring the good 
> stuff|"Softball;#Blanket;#I bring boo-boo, but he hides"
> 2|Boo-Boo|Yes|Potato Salad;#Cole Slaw;#whatever Yogi doesn't eat|Lawn 
> Chairs;#Blanket;#my running shoes
> 3|Ranger Rick|No|I told you I don't picnic|a big net and handcuffs
> 4|Magilla Gorilla|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ 
> Chicken|Softball;#Volleyball;#Lawn Chairs;#Blanket
> 5|Foghorn Leghorn|Yes|"Hot Dogs;#Cole Slaw;#I say, I say, BBQ 
> Chicken?"|Softball;#Blanket
> 6|Peter Potamus|Yes|"Hamburgers;#Hot Dogs;#anything, just a lot of 
> it"|Softball;#Lawn Chairs;#hot air balloon
> 7|Jonny Quest|No|too busy getting into and out of trouble|Hadji and Bandit
> 8|"Fleegle, Bingo, Drooper and Snorky"|Yes|Hamburgers;#Hot Dogs;#Potato 
> Salad;#Cole Slaw;#A banana split|a laugh track
> 9|George Jetson|No|Mr. Spacely is making me work|Lawn Chairs;#Blanket;#my 
> flying car
> 10|Snagglepuss|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ 
> Chicken|Softball;#Heavens to Murgatroyd!  Exit stage left!
>
> Thanks in advance,
>
> Dale
>
> -----Original Message-----
> From: jim holtman [mailto:[EMAIL PROTECTED]
> Sent: Saturday, July 12, 2008 11:32 AM
> To: Hohm, Dale
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading Multi-value data fields for descriptive analysis
>
> Can you provide a more complete example (say 10 lines) of what the
> input is like. Does each line have a unique index that can be related
> to it?  Do you want to summarize all the multi1-n values of Col2?  Do
> you want to know the percentage of input lines that have a
> Col3/multi-value4 on them?  You could read in the data as you have
> indicated below and add a column that is the record number and
> therefore you would have have to worry about trying to say if it
> existed or not.  For example, you might have:
>
> Rec#|col#|value
> 1|1|single
> 1|2|multi1
> 1|2|multi2
> 1|3|multi1
> 2|1|single
> 3|1|single
> 3|2|multi1
> ....
>
> There are a number of potential ways of representing the data, but a
> lot depends on what you want to do with it, so a more extensive
> example of the input, along with the type of output you would like
> will help in providing an answer.
>
> On Sat, Jul 12, 2008 at 12:37 PM, Hohm, Dale <[EMAIL PROTECTED]> wrote:
>> Hello,
>>
>> I'm looking for help on the best approach to get "multi-value" data fields 
>> into R for simple descriptive analysis.
>>
>> -------------------------------------
>>
>> I am new to this list and new to R, but I really want to get over the hump 
>> and get productive with it.  Some help with how to best get the following 
>> data into R would be greatly appreciated.  I have programming experience and 
>> stale experience with SPSS.
>
>>
>> I am trying to do some simple descriptive analysis (frequencies, cross-tabs) 
>> of data stored in a Microsoft SharePoint list.  The data can be accessed 
>> with ODBC or it can readily be extracted into an Excel or CSV format.  One 
>> of the challenges with the data is that it uses several "multi-value" fields 
>> (Microsoft Access provides the same data-type).
>>
>> By "multi-value" I mean that multiple responses are packed into a single 
>> data column; the data input form presents a question with several checkboxes 
>> and a free-format write-in response.  The individual values within the data 
>> field are separated with the two characters ";#".  So, the data would be of 
>> the following format (in CSV form with column headers and a tilde as the 
>> field separator):
>>
>> Column1single~Column2multi~Column3multi
>> a sample value~C2 a multi one;#C2 a multi two~C3 a multi one;#C3 a multi 
>> two;#C3 a free-form answer
>>
>>
>> The first approach that comes to mind is to explode the multi-value fields 
>> into unique bi-variate data columns and then assign a 0 or 1 to these new 
>> columns in each record based on whether that specific value was present.  
>> This approach is complicated by the free-form answer as the unique columns 
>> could grow very large in number - it might be better to figure out how to 
>> indicate the presence of the free-form value in a data column called "Other" 
>> (or "C2 Other") and then hold the free-form value in a separate column.
>>
>> The data would then look like this...
>>
>> Column1single: a sample value
>> C2 a multi one: 1
>> C2 a multi two: 1
>> C2 a multi three: 0
>> C3 a multi one: 1
>> C3 a multi two: 1
>> C3 a free-form answer: 1
>> C3 another free-form answer: 0
>>
>>
>> Or in the second scenario...
>>
>> Column1single: a sample value
>> C2 a multi one: 1
>> C2 a multi two: 1
>> C2 a multi three: 0
>> C3 a multi one: 1
>> C3 a multi two: 1
>> C3 Other: 1
>> C3 Other Text: a free-form answer
>>
>>
>> I am uncertain help to read this data into R in this format, so suggestions 
>> and examples would help me greatly.
>>
>> This is a pretty common data packing scenario, so perhaps there are better 
>> approaches to reading this data and better ways in R to analyze it than what 
>> I have presented.  Suggestions greatly appreciated.
>>
>>
>> Thanks,
>>
>> Dale Hohm
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading Multi-value data fields for descriptive analysis

Reply via email to