Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
Hi Bert, The “status” at the end of the study does exist in the original dataset, what was missing was the time between events. And there exists so many events that fall between the first and last day to be explored in this work. The suggestion I received then, was to compute time between the in

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
Hi Jeff, thanks and I will explore your suggestions too.. Regards --- Kevin Wame On 7/4/16, 12:43 AM, "Jeff Newmiller" wrote: There are a great many hits when I search on the keywords "kaplan meier plot R"... so my

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Bert Gunter
A kaplan-meier plot requires for each individual (in each treatment group, if there are more than one): 1. Survival time,which in your case appears to mean time without disease; 2. Status at end of time on study: whether the individual was censored (still without disease) or died (in your case, wa

[R] Comparing two diagnostic tests with lme4

2016-07-03 Thread Keno Kyrill Bressem
Dear R experts, I compare two diagnostic tests. Therfore, I collected patient data from several studies. The dataframe is similar to this one: set.seed(10) data = data.frame( test1 = rbinom(1000, 1, 0.6), test2 = rbinom(1000, 1, 0.4), reference = rbinom(1000,

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Jeff Newmiller
There are a great many hits when I search on the keywords "kaplan meier plot R"... so my first reaction is that you should be referring to some of the existing packages for doing this type of analysis. I do not do this type of analysis normally, so am probably not your best helper... perhaps som

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
Hi Bert, my first task is to make a Kaplan Meier Plot to evaluate the risk of developing disease in the treated vs the non-treated individuals. I therefore figured it might be easier to compute dates first as any further analysis will be based on time, in this case days. I keep getting recommend

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Bert Gunter
I haven't followed this thread closely, but if it's not too late, I might suggest that you stop worrying about how you want your data frame to look and start worrying about you want to display/analyze your data. As Jeff suggested, you and your supervisor are probably being driven by paradigms from

Re: [R] regroup row names

2016-07-03 Thread Ulrik Stervbo
Do the elements in 'locs' hahabe an _ somewhere? If not the search and replace find nothing. Bert's suggestion of taking a substring is better if you are just interested in characters on fixed positions. Bert also suggested that you could maybe benefit from reading a few tutorials and I agree. B

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
Hi Jeff, It works on well on a dataset with 10 rows and I figure it will work well with the “real” dataset. You’ve been of great help and I am starting to make headway. It creates a new dataframe (result), as shown below that doesn’t quite have the result as I would want it. ID admin

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
Thanks Jeff, let me try it on the larger dataset. Regards --- Kevin Wame On 7/3/16, 10:09 PM, "Jeff Newmiller" wrote: result <- ( result0 %>% select( -admin_period1 ) %>% inner_join( result0 %>

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Jeff Newmiller
Typo on the second line result <- ( result0 %>% select( -admin_period1 ) %>% inner_join( result0 %>% select( ID, admin_period1, end=start ) , by = c( ID="ID", admin_period ="admin_period1" ) ) %>% mutate( ddays = end -

Re: [R] regroup row names

2016-07-03 Thread Ulrik Stervbo
Hi Lily, My suggestion should remove the underscore and everything after it, leaving just aClim and bClim in the ID column. Best Ulrik On Sun, 3 Jul 2016, 20:34 lily li, wrote: > Hi Ulrik, > > Thanks. This is for one group, but how to do for several groups? I tried > gsub(c(),c(),df$ID), but i

Re: [R] regroup row names

2016-07-03 Thread Bert Gunter
I strongly suspect that you do not need to do this. What I think you do need to do is to create a new column (which will be a factor) identifying the climate ("a" or "b"), which can then be used to group climates in plots, used as a covariate in statistical analyses, etc. Moreover, there is probabl

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Jeff Newmiller
I still get the impression from your mixing of information types that you are thinking like this is Excel. Perhaps something like drug_study$admin_period <- ave( "Y" == drug_study$drug_admin, drug_study$ID, FUN=cumsum ) library(dplyr) result0 <- ( drug_study %>% filter( 0 != admin_

Re: [R] regroup row names

2016-07-03 Thread lily li
Hi Ulrik, Thanks. This is for one group, but how to do for several groups? I tried gsub(c(),c(),df$ID), but it does not work. On Sun, Jul 3, 2016 at 12:24 PM, Ulrik Stervbo wrote: > Hi Lily, > > you can use gsub: > > df$ID <- gsub("_.*", "", df$ID) > > HTH > Ulrik > > On Sun, 3 Jul 2016 at 20:

Re: [R] regroup row names

2016-07-03 Thread Ulrik Stervbo
Hi Lily, you can use gsub: df$ID <- gsub("_.*", "", df$ID) HTH Ulrik On Sun, 3 Jul 2016 at 20:16 lily li wrote: > I have a problem in changing row names in a dataframe in R. The first > column is ID, such as aClim_st02, aClim_st03, aClim_st 05, bClim_st01, > bClim_st02, etc. How to rename the

[R] regroup row names

2016-07-03 Thread lily li
I have a problem in changing row names in a dataframe in R. The first column is ID, such as aClim_st02, aClim_st03, aClim_st 05, bClim_st01, bClim_st02, etc. How to rename the names, so that aClim_ all grouped to aClim, while bClim_ all grouped to bClim? Thanks for your help. df ID

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
HI Jeff, it’s been an uphill task working with the dataset and I am not the first to complain. Nonetheless, data-cleaning is ongoing and since I cannot wait for that to get done, I decided to make the most of what the dataset looks like at this time. It appears the process may take a while. Tha

Re: [R] BCa Bootstrapped regression coefficients from lmrob function not working

2016-07-03 Thread peter dalgaard
> On 03 Jul 2016, at 13:47 , varin sacha via R-help > wrote: > > Dear R-experts, > > I am trying to calculate the bootstrapped (BCa) regression coefficients for a > robust regression using MM-type estimator (lmrob function from robustbase > package). > > My R code here below is showing a wa

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Jeff Newmiller
Your goal of putting character representations of dates in certain rows of a column is hard to imagine a use for. Your goal of identifying start and end dates seems reasonable enough. It can be accomplished using aggregate from base R (less external dependency) or summarise from dplyr (faster,

Re: [R] Extracting matrix from netCDF file using ncdf4 package

2016-07-03 Thread Bert Gunter
Well, yes, ... but no: there is no need to pre-define the matrix. The following is still a (interpreted) loop, but it is fast and short. ## ex is the downloaded array, here filled with random numbers reqX = c(35,35,40,65,95) reqY = c(2,5,10,112,120) out <-sapply(seq_along(reqX), function(i)ex[r

Re: [R] Extracting matrix from netCDF file using ncdf4 package

2016-07-03 Thread Hemant Chowdhary via R-help
Thank you both. Yes, this is basically the issue of able to subset an array rather than extracting from the netCDF file. The dX = ncvar_get(nc=myNC, varid="myVar")command already results in the array. And one can subset that array using indices. In turn the problem can be stated as follows:Let u

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread John Kane
The data set did not show up. The R-help list tends to strip out most file types as a safety precaution. Try renaming the file from xxx.csv to xxx.txt and it should come through alright. John Kane Kingston ON Canada > -Original Message- > From: kwa...@kemri-wellcome.org > Sent: Sun,

[R] BCa Bootstrapped regression coefficients from lmrob function not working

2016-07-03 Thread varin sacha via R-help
Dear R-experts, I am trying to calculate the bootstrapped (BCa) regression coefficients for a robust regression using MM-type estimator (lmrob function from robustbase package). My R code here below is showing a warning message ([1] "All values of t are equal to 22.2073014256803\n Can not cal

Re: [R] trouble double looping to generate data for a meta-analysis

2016-07-03 Thread Jim Lemon
Hi Marietta, You may not be aware that the variable k is doing nothing in your example except running the random variable generation 2 or 3 times for each cycle of the outer loop as each successive run just overwrites the one before. If you want to include all two or three lots of values you will h

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
Hi Jeff, pardon me, I was surely not making it easy. I hope this time I will ☺ Attached is snippet of the dataset in csv format and below is the R.script I have managed so far. ---

Re: [R] lineplot.CI xaxis scale change in sciplot?

2016-07-03 Thread Jim Lemon
Hi Clemence, I don't have sciplot installed, but the help page suggests that the "xaxt" argument is available. This will prevent the x axis from being displayed and you can then specify the x axis you want. Assume that you want an x axis from 0 to 300 by 50: axis(1,at=seq(0,300,by=50)) Jim On T