[R] use subset to trim data but include last per category

Giovanni Azua Sun, 09 Sep 2012 08:15:43 -0700

Hello,

I bumped into the following funny use-case. I have too much data for a given 
plot. I have the following data frame df:


> str(df)
'data.frame':   5015 obs. of  5 variables:
 $ n          : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ iter       : int  10 20 30 40 50 60 70 80 90 100 ...
 $ Error      : num  1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
 $ Duality_Gap: num  20080 3789 855 443 321 ...
 $ Runtime    : num  0.00536 0.01353 0.01462 0.01571 0.01681 ...

But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations due 
to taking a snapshot every 10 iterations rather than say 500 and the plot looks 
very cluttered. So I would like to trim the data frame including only those 
records for which iter is multiple of 500 and so I do this:

df <- subset(df, iter %% 500 == 0)

This gives me almost exactly what I need except that the last and most 
important Duality Gap observations are of course gone due to the filtering ... 
I would like to change the subset clause to be iter %% 500 _or_ the record is 
the last per n (n is my problem size and category in this case) ... how can I 
do that?

I thought of adding a new column that flags whether a given row is the last 
element per category as "last" Boolean but this is a bit too complicated .. is 
there a simpler condition construct that can be used with the subset command?

TIA,
Best regards,
Giovanni    
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] use subset to trim data but include last per category

Reply via email to