On 01/07/2009 1:26 PM, Mark Knecht wrote:
On Wed, Jul 1, 2009 at 9:39 AM, Duncan Murdoch<murd...@stats.uwo.ca> wrote:
On 01/07/2009 11:49 AM, Mark Knecht wrote:
Hi,
I have a data.frame that is date ordered by row number - earliest
date first and most current last. I want to create a couple of new
columns that show the max and min values from other columns *so far* -
not for the whole data.frame.
It seems this sort of question is really coming from my lack of
understanding about how R intends me to limit myself to portions of a
data.frame. I get the impression from the help files that the generic
way is that if I'm on the 500th row of a 1000 row data.frame and want
to limit the search max does to rows 1:500 I should use something
like [1:row] but it's not working inside my function. The idea works
outside the function, in the sense I can create tempt1[1:7] and the
max function returns what I expect. How do I do this with row?
Simple example attached. hp should be 'highest p', ll should be
'lowest l'. I get an error message "Error in 1:row : NA/NaN argument"
Thanks,
Mark
<SNIP>
HighLow = function (MyFrame) {
temp1 <- MyFrame$p[1:row]
MyFrame$hp <- max(temp1) ## Highest p
temp1 <- MyFrame$l[1:row]
MyFrame$ll <- min(temp1) ## Lowest l
return(MyFrame)
}
You get an error in this function because you didn't define row, so R
assumes you mean the function in the base package, and 1:row doesn't make
sense.
What you want for the "highest so far" is the cummax (for "cumulative
maximum") function. See ?cummax.
Duncan Murdoch
Duncon,
OK, thanks. That makes sense, as long as I want the cummax from the
beginning of the data.frame. (Which is exactly what I asked for!)
How would I do this in the more general case if I was looking for
the cummax of only the most recent 50 rows in my data.frame? What I'm
trying to get down to is that as I fill in my data.frame I need to be
able get a max or min or standard deviation of the previous so many
rows of data - not the whole column - and I'm just not grasping how to
do this. Is seems like I should be able to create a data set that's
only a portion of a column while I'm in the function and then take the
cummax on that, or use it as an input to a standard deviation, etc.?
What you describe might be called a "running max". The caTools package
has a runmax function that probably does what you want.
More generally, you can always write a loop. They aren't necesssrily
fast or elegant, but they're pretty general. For example, to calculate
the max of the previous 50 observations (or fewer near the start of a
vector), you could do
x <- ... some vector ...
result <- numeric(length(x))
for (i in seq_along(x)) {
result[i] <- max( x[ max(1, i-49):i ])
}
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.