Hi Bill, Thank you for your suggestion.
I shall try running the code and test as you suggest. Is there a straightforward way to routinely test the structure of a complex survey data set such as this? For example with a multinomial choice model such as this for the data to be correct for each observation set of say 6 choices there can be only 1 choice selected and 1 non-missing month. This can however be an issue to check when dealing with very large datasets. Presumably if more than one choice in a set is positive this will show by the model failing to converge due to singularity but this should have been detected at the data cleaning stage. Best wishes Graham -----Original Message----- From: William Dunlap [mailto:wdun...@tibco.com] Sent: 06 April 2013 22:49 To: Rui Barradas; Leask, Graham Cc: r-help@r-project.org Subject: RE: [R] Replace missing value within group with non-missing value > Anyway, try replacing the lapply instruction with this. > > tmp <- lapply(sp, function(x){ > idx <- which(!is.na(x$mth))[1] > if(length(idx) > 0) > x$mth <- x$mth[idx] > x > }) Note that which(anyLogicalVector)[1] always has length 1, because of the subscript [1], so the 'if' statement may as well be omitted. There are 2 cases the above code does not detect or deal with. (a) nrow(x)==0 (b) all(is.na(x$mth)) (c) length(which(is.na(x$mth))) > 1 Case (a) causes the function to stop in way you saw: > f <- function(x) { # the function passed to lapply + idx <- which(!is.na(x$mth))[1] + if (length(idx) > 0) + x$mth <- x$mth[idx] + x + } > f(data.frame(mth=integer())) Error in `$<-.data.frame`(`*tmp*`, "mth", value = NA_integer_) : replacement has 1 rows, data has 0 but (b) and (c) may indicate some errors in your data and cause some surprises down the line. > f(data.frame(mth=c(NA,NA))) mth 1 NA 2 NA > f(data.frame(mth=c(NA,2,3))) mth 1 2 2 2 3 2 You could have your code check whether there is exactly one non-missing value for mth in each non-empty group and warn if that assumption is not true for some group (but also return some reasonable result)? The following does that: f2 <- function (x) { idx <- !is.na(x$mth) # logical vector with length nrow(x) nNotNA <- sum(idx) if (nNotNA > 1) { warning("more than one non-missing mth value in group, using the first") idx[cumsum(idx) > 1] <- FALSE } else if (nrow(x) > 0 && nNotNA == 0) { warning("no non-missing values in group, all mth values will be NA") idx[1] <- TRUE } x$mth <- x$mth[idx] x } The error messages do not say where in 'sp' the problem arose. You could change your lapply call so the group number was in the warning: lapply(seq_along(sp), function(i) { x <- sp[[i]] ... same code as in f2, but add the group number, i, to the end of warnings ... warning("more than one ... in group number", i) ... }) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas > Sent: Saturday, April 06, 2013 10:24 AM > To: Leask, Graham > Cc: r-help@r-project.org > Subject: Re: [R] Replace missing value within group with non-missing > value > > Hello, > > I've just run my code with your data and found no error. Anyway, try > replacing the lapply instruction with this. > > > tmp <- lapply(sp, function(x){ > idx <- which(!is.na(x$mth))[1] > if(length(idx) > 0) > x$mth <- x$mth[idx] > x > }) > > > Rui Barradas > > Em 06-04-2013 18:12, Leask, Graham escreveu: > > Hi Arun, > > > > How odd. Directly pasting the code from your email precisely repeats the > > error. > > See below. Any thoughts on the cause of this anomaly? > > > >> dput(head(dat,50)) > > structure(list(dn = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, > > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, > > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), obs = c(1, 1, 1, 1, 1, 1, 2, 2, > > 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, > > 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9), choice = > > c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, > > 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, > > 0, 0, 1, 0, 0), br = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, > > 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, > > 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2), mth = c(NA, NA, NA, NA, NA, > > 487, NA, NA, 488, NA, NA, NA, NA, NA, NA, NA, NA, 488, NA, NA, 489, > > NA, NA, NA, NA, NA, NA, NA, NA, 489, NA, NA, NA, NA, NA, 489, NA, > > NA, NA, NA, NA, 490, NA, NA, NA, NA, NA, 491, NA, NA)), .Names = > > c("dn", "obs", "choice", "br", "mth"), row.names = c("1", "2", "3", > > "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", > > "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", > > "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", > > "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", > > "49", "50"), class = "data.frame") > >> sp <- split(dat, list(dat$dn, dat$obs)) > >> names(sp) <- NULL > >> tmp <- lapply(sp, function(x){ > > + idx <- which(!is.na(x$mth))[1] > > + x$mth <- x$mth[idx] > > + x > > + }) > > Error in `$<-.data.frame`(`*tmp*`, "mth", value = NA_real_) : > > replacement has 1 rows, data has 0 > >> head(do.call(rbind, tmp),7) > > Error in do.call(rbind, tmp) : object 'tmp' not found > > > > Best wishes > > > > > > Graham > > > > -----Original Message----- > > From: arun [mailto:smartpink...@yahoo.com] > > Sent: 06 April 2013 17:25 > > To: Leask, Graham > > Cc: Rui Barradas > > Subject: Re: [R] Replace missing value within group with non-missing > > value > > > > Hello, > > By running Rui's code, I am getting this: > > sp <- split(dat, list(dat$dn, dat$obs)) > > names(sp) <- NULL > > tmp <- lapply(sp, function(x){ > > idx <- which(!is.na(x$mth))[1] > > x$mth <- x$mth[idx] > > x > > }) > > head(do.call(rbind, tmp),7) > > dn obs choice br mth > > 1 4 1 0 1 487 > > 2 4 1 0 2 487 > > 3 4 1 0 3 487 > > 4 4 1 0 4 487 > > 5 4 1 0 5 487 > > 6 4 1 1 6 487 > > 7 4 2 0 1 488 > > > > Couldn't reproduce the error you cited. > > A.K. > > > > > > > > > > ----- Original Message ----- > > From: "Leask, Graham" <g.le...@aston.ac.uk> > > To: Rui Barradas <ruipbarra...@sapo.pt> > > Cc: "r-help@r-project.org" <r-help@r-project.org> > > Sent: Saturday, April 6, 2013 12:16 PM > > Subject: Re: [R] Replace missing value within group with non-missing > > value > > > > Hi Rui, > > > > Data as follows > > > > structure(list(dn = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, > > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), > obs = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, > 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, > 8, 8, 8, 8, 9, 9), choice = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, > 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, > 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0), br = c(1, 2, 3, 4, 5, 6, 1, > 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, > 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2), mth = > c(NA, NA, NA, NA, NA, 487, NA, NA, 488, NA, NA, NA, NA, NA, NA, NA, NA, 488, > NA, NA, 489, NA, NA, NA, NA, NA, NA, NA, NA, 489, NA, NA, NA, NA, NA, 489, > NA, NA, NA, NA, NA, 490, NA, NA, NA, NA, NA, 491, NA, NA)), .Names = c("dn", > "obs", "choice", "br", "mth"), row.names = c("1", "2", "3", "4", "5", "6", > "7", "8", "9", "10", "11"! > , "12", > "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", > "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", > "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", > "46", "47", "48", "49", "50"), class = "data.frame") > > > > Best wishes > > > > > > Graham > > > > -----Original Message----- > > From: Rui Barradas [mailto:ruipbarra...@sapo.pt] > > Sent: 06 April 2013 16:32 > > To: Leask, Graham > > Cc: r-help@r-project.org > > Subject: Re: [R] Replace missing value within group with non-missing > > value > > > > Hello, > > > > Can't you post a data example? If your dataset is named 'dat' use > > > > dput(head(dat, 50)) # paste the output of this in a post > > > > > > Rui Barradas > > > > Em 06-04-2013 15:34, Leask, Graham escreveu: > >> Hi Rui, > >> > >> Thank you for your suggestion which is very much appreciated. > >> Unfortunately running > this code produces the following error. > >> > >> error in '$<-.data.frame' ('*tmp*', "mth", value = NA_real_) : > >> replacement has 1 rows, data has 0 > >> > >> I'm sure there must be an elegant solution to this problem? > >> > >> Best wishes > >> > >> > >> > >> Graham > >> > >> On 6 Apr 2013, at 12:15, "Rui Barradas" <ruipbarra...@sapo.pt> wrote: > >> > >>> Hello, > >>> > >>> That's not a very good way of posting your data, preferably paste > >>> the output of > ?dput in a post. > >>> Some thing along the lines of the following might do what you want. > >>> It seems that the groups are established by 'dn' and 'obs' numbers. > >>> If so, try > >>> > >>> > >>> # Make up some data > >>> dat <- data.frame(dn = 4, obs = rep(1:5, each = 6), mth = NA) > >>> dat$mth[6] <- 487 dat$mth[9] <- 488 dat$mth[18] <- 488 dat$mth[21] > >>> <- > >>> 489 dat$mth[30] <- 489 > >>> > >>> > >>> sp <- split(dat, list(dat$dn, dat$obs)) > >>> names(sp) <- NULL > >>> tmp <- lapply(sp, function(x){ > >>> idx <- which(!is.na(x$mth))[1] > >>> x$mth <- x$mth[idx] > >>> x > >>> }) > >>> do.call(rbind, tmp) > >>> > >>> > >>> Hope this helps, > >>> > >>> Rui Barradas > >>> > >>> > >>> Em 06-04-2013 11:33, Leask, Graham escreveu: > >>>> Dear List members > >>>> > >>>> I have a large dataset organised in choice groups see sample > >>>> below > >>>> > >>>> > >>>> +---------------------------------------------------------------- > >>>> +---- > >>>> -----------------------------+ > >>>> | dn obs choice acid br date > >>>> cdate situat~n mth year set | > >>>> > >>>> |---------------------------------------------------------------- > >>>> |---- > >>>> -----------------------------| > >>>> 1. | 4 1 0 LOSEC 1 . > >>>> . . . 1 | > >>>> 2. | 4 1 0 NEXIUM 2 . > >>>> . . . 1 | > >>>> 3. | 4 1 0 PARIET 3 . > >>>> . . . 1 | > >>>> 4. | 4 1 0 PROTIUM 4 . > >>>> . . . 1 | > >>>> 5. | 4 1 0 ZANTAC 5 . > >>>> . . . 1 | > >>>> > >>>> |---------------------------------------------------------------- > >>>> |---- > >>>> -----------------------------| > >>>> 6. | 4 1 1 ZOTON 6 23aug2000 01:00:00 > >>>> 23aug2000 NS 487 2000 1 | > >>>> 7. | 4 2 0 LOSEC 1 . > >>>> . . . 2 | > >>>> 8. | 4 2 0 NEXIUM 2 . > >>>> . . . 2 | > >>>> 9. | 4 2 1 PARIET 3 25sep2000 01:00:00 > >>>> 25sep2000 L 488 2000 2 | 10. | 4 2 0 > >>>> PROTIUM 4 . . . . > >>>> 2 | > >>>> > >>>> |---------------------------------------------------------------- > >>>> |---- > >>>> -----------------------------| 11. | 4 2 0 ZANTAC > >>>> 5 . . . . 2 | > >>>> 12. | 4 2 0 ZOTON 6 . > >>>> . . . 2 | 13. | 4 3 0 LOSEC > >>>> 1 . . . . 3 | > >>>> 14. | 4 3 0 NEXIUM 2 . > >>>> . . . 3 | 15. | 4 3 0 PARIET > >>>> 3 . . . . 3 | > >>>> > >>>> |---------------------------------------------------------------- > >>>> |---- > >>>> -----------------------------| 16. | 4 3 0 PROTIUM > >>>> 4 . . . . 3 | > >>>> 17. | 4 3 0 ZANTAC 5 . > >>>> . . . 3 | 18. | 4 3 1 ZOTON > >>>> 6 20sep2000 00:00:00 20sep2000 R 488 2000 3 | > >>>> 19. | 4 4 0 LOSEC 1 . > >>>> . . . 4 | 20. | 4 4 0 NEXIUM > >>>> 2 . . . . 4 | > >>>> > >>>> |---------------------------------------------------------------- > >>>> |---- > >>>> -----------------------------| 21. | 4 4 1 PARIET > >>>> 3 27oct2000 00:00:00 27oct2000 NL 489 2000 4 | > >>>> 22. | 4 4 0 PROTIUM 4 . > >>>> . . . 4 | 23. | 4 4 0 ZANTAC > >>>> 5 . . . . 4 | > >>>> 24. | 4 4 0 ZOTON 6 . > >>>> . . . 4 | 25. | 4 5 0 LOSEC > >>>> 1 . . . . 5 | > >>>> > >>>> |---------------------------------------------------------------- > >>>> |---- > >>>> -----------------------------| 26. | 4 5 0 NEXIUM > >>>> 2 . . . . 5 | > >>>> 27. | 4 5 0 PARIET 3 . > >>>> . . . 5 | 28. | 4 5 0 PROTIUM > >>>> 4 . . . . 5 | > >>>> 29. | 4 5 0 ZANTAC 5 . > >>>> . . . 5 | 30. | 4 5 1 ZOTON > >>>> 6 23oct2000 03:00:00 23oct2000 NS 489 2000 5 | > >>>> > >>>> I wish to fill in the missing values in each choice set - > >>>> delineated by dn (Doctor) obs > (Observation number) and choices (1 to 6). > >>>> For each choice set one choice is chosen which contains full time > >>>> information for that choice set ie in set 1 choice 6 was chosen > >>>> and shows the > month 487. The other 5 choices show mth as missing. I want to fill > these with the correct mth. > >>>> > >>>> I am sure there must be an elegant way to do this in R? > >>>> > >>>> > >>>> Best wishes > >>>> > >>>> > >>>> > >>>> Graham > >>>> > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.