classification problem?

Bart Joosen Fri, 25 Jan 2013 10:56:52 -0800

Hi,


To clarify further: these are results for degradation studies.

We search for degradations at 0 months, again at 3 months, again at 6 months, 
...
Each analysis gives us a rrt, and a result.
To make final conclusions, we have to align the results manually (at least for 
now).


rrt is dependend on lots of factors, so there is a bit of variation there (eg 
0.48 at 0 months, 0.46 at 6 months, and again 0.48 at 9 months).


If you take a look at the sample data, you can see that the degradation with 
rrt 0.48 is increasing within time, so you can clearly see that 0.48 and 0.46 
are essentially the same degradant.


But rounding alone doesn't solve it all, as this can match the 0.46 at 6 months 
to the degradant with rrt 0.45 at 0 months, and this will give a really odd 
trend line for that degradant.


I was thinking about making a list of all rrt's, calculation every possible 
combination of shuffling within certain limits (eg max 10% or so), calculate r2 
for each combination and maximize?


Seems so brute force and low elegant?


Bart



> Date: Fri, 25 Jan 2013 10:01:44 -0800
> From: smartpink...@yahoo.com
> Subject: Re: [R] sorting/grouping/classification problem?
> To: bartjoo...@hotmail.com
> CC: djmu...@gmail.com; r-help@r-project.org
> 
> Hi,
> 
> Your question is bit confusing to me.
>   When you say that "which rrts are the same, and which are the  new ones", 
> to me it looks like "0.35, 0.36" are new addition to Mnd at time points 6 and 
> 9.
> Extending Dennis' solution:
> Just for understanding the problem:
>  vec1<-c(0.45,0.48,1.24,1.22,0.44,0.46,1.21)
>  dat$newCol<-ifelse(dat$rrt%in%vec1,"old","new")
> dcast(dat,Time+newCol~Mnd,value.var="Result")
> #    Time newCol   0   3    6    9
> #1 0.3550    new  NA  NA 0.05 0.06
> #2 0.4475    old 0.1 0.2 0.40 0.60
> #3 0.4750    old 0.3 0.6 1.20 1.80
> #4 1.2225    old 0.5 0.4 0.45 0.50
> A.K.
> 
> 
> 
> 
> 
> ----- Original Message -----
> From: Bart Joosen <bartjoo...@hotmail.com>
> To: Dennis Murphy <djmu...@gmail.com>; r-help@r-project.org
> Cc: 
> Sent: Friday, January 25, 2013 1:48 AM
> Subject: Re: [R] sorting/grouping/classification problem?
> 
> Nice suggestion for the extra "Time" column.
> 
> But I think I didn't ask clear enough my problem.
> My main problem is to find a way to "classify" the rrt's, so that we don't 
> have to check each dataframe by our selfs.
> 
> So I need a function that fills in the extra "Time" column by taking a look 
> at the rrt's (and maybe the results), and take the discision which rrts are 
> the same, and which are new ones.
> 
> As stated: rrt's never switch place, and results can't be concatenated or 
> averaged within a Mnd.
> 
> I hope my question is a bit more clear now.
> 
> Thank you all for your suggestions
> 
> Bart
> 
> > Date: Thu, 24 Jan 2013 15:01:40 -0800
> > Subject: Re: [R] sorting/grouping/classification problem?
> > From: djmu...@gmail.com
> > To: bartjoo...@hotmail.com
> > 
> > Hi:
> > 
> > Here's a potential workaround:
> > 
> > # Add a time order variable
> > dat$ord <- c(rep(2:4, 2), rep(1:4, 2))
> > 
> > # Average rrt by ord
> > dat$Time <- with(dat, ave(rrt, ord, FUN = mean))
> > dat
> > 
> > # Reshape the data
> > 
> > library(reshape2)
> > > dcast(dat, Time ~ Mnd, value.var = "Result")
> >     Time   0   3    6    9
> > 1 0.3550  NA  NA 0.05 0.06
> > 2 0.4475 0.1 0.2 0.40 0.60
> > 3 0.4750 0.3 0.6 1.20 1.80
> > 4 1.2225 0.5 0.4 0.45 0.50
> > 
> > You could always round dat$Time to two decimal places in its
> > definition before doing the cast if you so desired.
> > 
> > Dennis
> > 
> > On Thu, Jan 24, 2013 at 11:31 AM, Bart Joosen <bartjoo...@hotmail.com> 
> > wrote:
> > >
> > > Hi,
> > >
> > >
> > > I'm a database admin for a database which manage chromatographic results 
> > > of products during stability studies.
> > > I use R for the reporting of the results in MS Word through R2wd.
> > >
> > >
> > > But now I think I need your help:
> > > suppose we have the following data frame:
> > >
> > >
> > >    ID  rrt Mnd Result
> > > 1 0.45   0   0.10
> > > 1 0.48   0   0.30
> > > 1 1.24   0   0.50
> > > 2 0.45   3   0.20
> > > 2 0.48   3   0.60
> > > 2 1.22   3   0.40
> > > 3 0.35   6   0.05
> > > 3 0.44   6   0.40
> > > 3 0.46   6   1.20
> > > 3 1.21   6   0.45
> > > 4 0.36   9   0.06
> > > 4 0.45   9   0.60
> > > 4 0.48   9   1.80
> > > 4 1.22   9   0.50
> > >
> > >
> > >
> > > ID is the database ID, rrt is an identifier for the result, Mnd is the 
> > > timepoint of analysis and Result is... the result of the test.
> > > What I need is this dataframe in a wide format (which I managed with dat2 
> > > <- as.data.frame(tapply(dat$Result,list(rrt=dat$rrt,Mnd=dat$Mnd), 
> > > function(x) paste(x[x!=""],collapse="/"))) )
> > > But as you can see, rrt is not an exact identifier for the result.
> > >
> > > Sometimes rrt for 0 Mnd is 0.45, but at 6 Mnd the rrt is 0.44.
> > > Now I need the results to align so that one can easily see how rrt x is 
> > > evolving within the Mnd time points.
> > > I tried with different rounding procedures (round every 0.02, check that 
> > > no results are discarded this way, and check for alignment), but nothing 
> > > seems to make some sense.
> > > Also tried checking the highest results in each Mnd, align these, 
> > > determine correction factors for the rrt for all the other rrts, ...
> > >
> > >
> > > Some results will follow a trend (like rrt 0.45), some will remain more 
> > > or less stable.
> > > But NEVER rrt will switch i with each other!
> > >
> > >
> > >
> > >
> > > Ultimately I need to update in the db, so I need a list/dataframe with 
> > > the ID, the original rrt and the adjusted rrt (maybe the first occuring 
> > > rrt, or the mean of the rrts, doesn't matter).
> > >
> > >
> > >
> > >
> > > Any ideas about which algorithms can be used? I searched on pubmed, but 
> > > couldn't find anything
> > >
> > >
> > >
> > >
> > > Thanks
> > >
> > >
> > > Bart
> > >
> > >
> > > PS: to get the data:
> > >
> > >
> > > dat <-
> > > structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
> > > 4L, 4L, 4L, 4L), rrt = c(0.45, 0.48, 1.24, 0.45, 0.48, 1.22,
> > > 0.35, 0.44, 0.46, 1.21, 0.36, 0.45, 0.48, 1.22), Mnd = c(0L,
> > > 0L, 0L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 9L, 9L, 9L, 9L), Result = c(0.1,
> > > 0.3, 0.5, 0.2, 0.6, 0.4, 0.05, 0.4, 1.2, 0.45, 0.06, 0.6, 1.8,
> > > 0.5)), .Names = c("ID", "rrt", "Mnd", "Result"), class = "data.frame", 
> > > row.names = c(NA,
> > > -14L))
> > >
> > >
> > >
> > > resulting dataframe:
> > > dat3 <-
> > > structure(list(Time = c(0.355, 0.45, 0.48, 1.22), `0` = c(NA,
> > > 0.1, 0.3, 0.5), `3` = c(NA, 0.2, 0.6, 0.4), `6` = c(0.05, 0.4,
> > > 1.2, 0.45), `9` = c(0.06, 0.6, 1.8, 0.5)), .Names = c("Time",
> > > "0", "3", "6", "9"), class = "data.frame", row.names = c(NA,
> > > -4L))
> > >
> > >
> > >
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>                           
>     [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sorting/grouping/classification problem?

Reply via email to