Re: [R] Ordering Duplicates for Selection

jim holtman Tue, 05 Oct 2010 08:59:36 -0700

Here is a way of putting "Order" on your data:

> x
  V1       V2     V3 V4         V5
1  1 12345678 Soc101 34 02-04-2003
2  2 12345678 Soc101 62 31-11-2004
3  3 12345678 Psy104 63 03-05-2003
4  4 23456789 Soc101 73 02-04-2003
5  5 23456789 Psy104 76 25-02-2004
> x$order <- ave(x$V1, x$V2, x$V3, FUN=seq_along)
> x
  V1       V2     V3 V4         V5 order
1  1 12345678 Soc101 34 02-04-2003   1
2  2 12345678 Soc101 62 31-11-2004   2
3  3 12345678 Psy104 63 03-05-2003   1
4  4 23456789 Soc101 73 02-04-2003   1
5  5 23456789 Psy104 76 25-02-2004   1
>



On Tue, Oct 5, 2010 at 11:42 AM, C C <ps...@hotmail.com> wrote:
>
> Hi all,
>
> I've found a lot of helpful info regarding identifying and deleting 
> duplicates but I'd like to do something a little different - I'd like to 
> identify the duplicate values but instead of deletion, label them with a 
> value.
>
> I am working with historical data regarding school courses:
>
>
>
>                Student Number              Course                  Final Mark 
>           Completed
> Date
>
> 1              12345678                             Soc101                  
> 34                           02-04-2003
>
> 2              12345678                             Soc101                  
> 62                           31-11-2004
>
> 3              12345678                             Psy104                  
> 63                           03-05-2003
>
> 4              23456789                             Soc101                  
> 73                           02-04-2003
>
> 5              23456789                             Psy104                  
> 76                           25-02-2004
>
>
> In this data frame, records 1 and 2 contain data for the same student taking 
> the same course.  In record 1, the student failed (Final Mark), took the 
> course again (Completed Date) and finally passed (Final Mark) in record 2.
>
> I'd like to be able to work with the data so that I could summarize the 
> achievement distribution for the first attempt records and then compare it to 
> the achievement distribution for the second attempt records.  In Excel I'd 
> use something like COUNTIF($A$2:A2,A2) in a new column and then summarize the 
> "1" values and "2" values.
>
>              Order    Student Number              Course                  
> Final Mark           Completed Date
>
> 1              1              12345678                             Soc101     
>              34                           02-04-2003
>
> 2              2              12345678                             Soc101     
>              62                           31-11-2004
>
> 3              1              12345678                             Psy104     
>              63                           03-05-2003
>
> 4              1              23456789                             Soc101     
>              73                           02-04-2003
>
> 5              1              23456789                             Psy104     
>              76                           25-02-2004
>
>
> I suspect the answer is in the list discussions on "deleting duplicate 
> records" but I'm still familiarizing myself with R and I'm not at a point to 
> be able to see how it could be modified.  Any thoughts?
>
> Cheers,
> Chris
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ordering Duplicates for Selection

Reply via email to