Hi All, today I have a more general question concerning the approach of storing different values from the analysis of multiple variables.
My task is to compare distributions in a universe with distributions from the respondents using a whole bunch of variables. Comparison shall be done on relative frequencies (proportions). I was thinking about the structure I should store the results in and came up with the following: -- cut -- library(stringi) # Result data frame # Some sort of tidytidy data set where # each value is stored as an identity. # This way all values for all variables could be stored in # one unique data structure. # If an additional variable added for the name of the # research one could also build result data set across # surveys. # Values for measure could be "number" for 'raw' values or # "freq" for frequencies/counts. # Values for unit could be "n" for 'numbers' and # "%" for percentages. d_test <- data.frame( group = rep(c("Universe", "Respondents"), each = 16), variable = rep("State", 32), value = rep(c(11.3, 12.7, 3.3, 5, 0.6, 8.1, 6.2, 5.8, 6.4, 14.5, 8.3, 0.3, 3.8, 2.5, 8.1, 3), 2), label = rep(c("Baden-Wuerttemberg", "Bayern", "Berlin", "Brandenburg", "Bremen", "Hamburg", "Hessen", "Mecklenburg-Vorpommern", "Niedersachsen", "Nordrhein-Westfalen", "Rheinland-Pfalz", "Saarland", "Sachsen", "Sachsen-Anhalt", "Schleswig-Holstein", "Thueringen"),2), measure = rep("freq", 32), unit = rep("%", 32), stringsAsFactors = FALSE ) # This way the variables can be selected using simple # value selection from Base R functionality. data <- d_test[d_test$variable == "State" ,] # And plot results for every variable. ggplot( data = data, aes( x = label, y = value, fill = group)) + geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1])) + scale_x_discrete(name = data$variable[1]) + scale_y_discrete(name = data$unit[1]) -- cut -- The reporting / presentation is done in R Markdown. I would load the result data set once at the beginning and running the comparisons as plots on each variable named in the results data set under "variable". If I follow this approach for my customer relationship survey, do think I would face drawbacks or run into serious trouble? I am interested in your opinion and open for other approaches and suggestions. Kind regards Georg ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.