Hi fellow R-users, well, say I got two groups, A and B. Nested within each group are subgroups and in each subgroup are objects with values x and y to a certain attribute. So, I can compute the portion of x-objects for each subgroup as #x/(#x + #y).
Artificial example with 12 subgroups in group A and 8 subgroups in group B: set.seed(123) count.x <- NULL count.y <- NULL j <- 1 for (i in sample(x = 10:20, size = 20, replace = TRUE)){ count.x[j] <- sample(x = 0:i, size = 1) count.y[j] <- i - count.x[j] j <- j + 1 } data <- data.frame(x.portion = (count.x/(count.x + count.y)), x.portion = (count.y/(count.x + count.y)), group = c(rep("A", 12), rep("B", 8), weight = (count.x + count.y) ) I am now interested in whether or not there is a difference in the portions of x-objects between group A and B and consider it a good idea – as seen in the example above – to weight for the total number of objects in each subgroup. Given data that is not considered a realization of some normal distribution, thinking of a test that uses ranks still does not look like a natural solution to this problem. But I guess it is possible. Though, Xie & Priebe (2002)* are not exactly aiming at this, their paper might give an idea how weighting may look like in the special case of the Mann/Whitney/Wilcoxon statistic. (Despite the hint by John & Priebe (2007)** that this “is not a candidate for the practitioner’s toolbox”.) Anyway, trying to apply the Wilcoxon rank sum test to weighted data, I was first tempted to replicate each portion by its weight. (Bad idea: data bloat, ties and probably a number of problems even worse.) Function wilcox_test in package coin has got a weight argument, but library(coin) wilcox_test(formula = x.portion ~ group, data = data, weight = ~ weight) leads to warning “Rank transformation doesn’t take weights into account”. Though, results differ from wilcox_test(formula = x.portion ~ group, data = data) and wilcox.test(formula = x.portion ~ group, data = data) The code in wilcox_test() and the functions it depends on looks a little bit interlaced to me, but I guess it is not what I am after. tl;dr: I am looking for a nonparametric alternative to wtd.t.test in package weights. Is anyone aware of an(other) implementation in R? Cheers, Alex PS: For real, it is a little bit trickier as there are more than two values and maybe even more than two groups. So Kruskal-Wallis test might be of interest, but I thought I keep it simple for the moment. * Jingdong Xie & Carey E. Priebe: A weighted generalization of the Mann–Whitney–Wilcoxon statistic. In: Journal of Statistical Planning and Inference 102 (2), 2002-04-01, pages 441–466. (http://dx.doi.org/10.1016/S0378-3758(01)00111-2.) ** Majnu John & Carey E. Priebe: A data-adaptive methodology for finding an optimal weighted generalized Mann-Whitney-Wilcoxon statistic. In: Computational Statistics & Data Analysis 51 (9), 2007-05-15, pages 4337–4353. (http://dx.doi.org/10.1016/j.csda.2006.06.003.) -- Alexander Sommer wissenschaftlicher Mitarbeiter Technische Universität Dortmund Fakultät Erziehungswissenschaft, Psychologie und Soziologie Forschungsverbund Deutsches Jugendinstitut/Technische Universität Dortmund Vogelpothsweg 78 44227 Dortmund Telefon: +49 231 755-8189 Fax: +49 231 755-6553 E-Mail: alexander.som...@tu-dortmund.de WWW: http://www.forschungsverbund.tu-dortmund.de/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.