On May 8, 2014, at 9:49 AM, Abhinaba Roy wrote: > Hi R helpers, > > I have a dataframe like > > ID Yr_Mnth AMT_PAID AMT_DUE paidToDue > CS00000026A 201301 320.48 1904 0.168319328 > CS00000026A 201302 4881.31 15708 0.310753119 > CS00000026A 201303 7609.04 25585 0.297402384 > CS00000026A 201304 9782.70 21896 0.446780234 > CS00000026A 201305 6482.01 22015 0.294436066 > CS00000026A 201306 5226.28 14280 0.365985994 > CS00000026A 201307 9078.47 19040 0.476810399 > CS00000026A 201308 7060.33 23800 0.296652521 > CS00000026A 201309 7595.57 17136 0.443252218 > CS00000026A 201310 5388.64 24752 0.217705236 > > The problem I am facing is to capture the change in 'paidToDue' which is > define as follows > > Let 'm' be the value of 'Yr_Mnth' in the current row (except the 1st row) > and 'm-1' be that in the previous row > > I am trying to add a column to the dataframe 'Change' which will have > values 'Improve','Deteriorate' and 'No change', which are defined as > > > if (AMT_PAID(m) != AMT_PAID(m-1)) & sign(paidToDue(m)-paidToDue(m-1)==1 & > abs(paidToDue(m)-paidToDue(m-1))>0.1 then 'Change' = 'Improve'
There is a `diff` function that may make this all much simpler: You could translate (AMT_PAID[m] != AMT_PAID[(m-1]) to diff(AMT_PAID) != 0 # length is 1 shorter than the input vector And sign(paidToDue[m]-paidToDue[m-1] ) ==1 to diff(paidToDue) > 0 # can pad with c(NA, ...) >From your incorrect use of parentheses for indexing, I'm guessing you are very >new to R programming. You also attempted to paste a CSV file and that was >rejected by the mail-server which only accepts MIME-text formatted files. >Despite the fact that most csv files really are text files, they often get >labeled differently by posters' mail clients. > if (AMT_PAID(m) != AMT_PAID(m-1)) & sign(paidToDue(m)-paidToDue(m-1) == -1 > & abs(paidToDue(m)-paidToDue(m-1)) > 0.1 then 'Change' = 'Deteriorate' > > else 'Change' = 'No change' If this were just a matter of differences in 'paidToDue' within values of ID, then it would be as simple as: dat$Change <- with( dat, ave( paidToDue, ID, FUN=function(x){ c(NA, c('Deteriorate', 'No change', 'Improve)[findInterval(x, c(-Inf, -0.1, 0.1, Inf) )] ) } ) ) > > > Note: I have 5000 unique ID in the data and this has to be done for each ID > and the data is sorted by Yr_Mnth. When you need to use multiple columns as input and work across rows I generally use an lapply( split(), fun)-strategy. > > Please find attached the csv file for reference. > > How can it be done in R? It's not going to be terribly difficult, but I'm concerned this is homework, so not trying for a complete solution. You have not done very much in the way of setting the context. > -- > Regards > Abhinaba Roy > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.