Re: [R] Removing & generating data by category

David Winsemius Thu, 29 Oct 2009 18:27:55 -0700

Color me puzzled. Can you express the run more clearly in Boolean logic?


If someone has five policies: 3 Life and 2 General ...  is he in or out?

Applying the alternate strategy to that data set I get:
out <- tapply( dat$clm, dat$uid, paste ,collapse=",")
>
> out

A1.B1A2.B2 A3.B1"General""General,Life" "General"A3.B3A4.B4 A5.B5"General,Life,General,General""General,Life,General" "General,Life"


Please explain why you want A3.B3.

--
David.

On Oct 29, 2009, at 8:56 PM, Steven Kang wrote:

Highly appreciate for all the help.

I have one more thing to resolve..
Suppose 3 additional records are binded to the previous arbitrarydata set.
i.e
> a <-data.frame(id=c(c("A1","A2","A3","A4","A5"),c("A3","A2","A3","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"),6),rep("Life",4)))> b <-data.frame(id=c("A3","A3","A4"),loc=c("B3","B3","B4"),clm=rep("General",3))
> dat <- rbind(a,b)
> dat
   id loc     clm
1  A1  B1 General
2  A2  B2 General
3  A3  B3 General
4  A4  B4 General
5  A5  B5 General
6  A3  B1 General
7  A2  B2    Life
8  A3  B3    Life
9  A4  B4    Life
10 A5  B5    Life
11 A3  B3 General
12 A3  B3 General
13 A4  B4 General
The records with row number 3, 11 & 12 and records with row number 4& 13 are identical.
     id  loc    clm                id  loc   clm
3  A3  B3 General          4 A4 B4 General
11 A3  B3 General        13 A4 B4 General
12 A3  B3 General
The provided solutions does not perform 1 to 1 matching. (i.e allthe matching duplicated records are removed..)
The desired output is:

     id   loc  clm
1   A1  B1 General
6   A3  B1 General
11 A3  B3 General
12 A3  B3 General
13 A4  B4 General
Are there solution to this problem with 'merging' function or otheralternative method?
Thanks



Steven
On Thu, Oct 29, 2009 at 10:30 PM, Adaikalavan Ramasamy <a.ramas...@imperial.ac.uk> wrote:
Here is another way based on pasting ids as hinted below:


a <- data.frame(id=c(c("A1","A2","A3","A4","A5"),
                  c("A3","A2","A3","A4","A5")),
                  loc=c("B1","B2","B3","B4","B5"),
                  clm=c(rep(("General"),6),rep("Life",4)))

a$uid <- paste(a$id, ".", a$loc, sep="")

out <- tapply( a$clm, a$uid, paste ) # can also add collapse=","
$A1.B1
[1] "General"

$A2.B2
[1] "General" "Life"

$A3.B1
[1] "General"

$A3.B3
[1] "General" "Life"

$A4.B4
[1] "General" "Life"

$A5.B5
[1] "General" "Life"


Then here are those with single policies.

> out[ which( sapply(out, length) == 1 ) ]
$A1.B1
[1] "General"

$A3.B1
[1] "General"




David Winsemius wrote:
On Oct 28, 2009, at 9:30 PM, Steven Kang wrote:

Dear R users,


Basically, from the following arbitrary data set:

a <-
data
.frame
(id
=
c
(c
("A1
","A2
","A3
","A4
","A5
"),c
("A3
","A2
","A3
","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"),6),rep("Life",4)))
a
  id   loc  clm
1  A1  B1 General
2  A2  B2 General
3  A3  B3 General
4  A4  B4 General
5  A5  B5 General
6  A3  B1 General
7  A2  B2    Life
8  A3  B3    Life
9  A4  B4    Life
10 A5  B5    Life
I desire removing records (highlighted records above) withidentical values
in each fields ("id" & "loc") but with different value of "clm" (i.e
according to category)

Take a look at this merge operation on separate rows of "a".
> merge( a[a$clm=="Life", ], a[a$clm=="General", ] , by=c("id","loc"), all=T)
  id loc clm.x   clm.y
1 A1  B1  <NA> General
2 A2  B2  Life General
3 A3  B1  <NA> General
4 A3  B3  Life General
5 A4  B4  Life General
6 A5  B5  Life General
Assignment of that object and selection with is.na should completethe process.
> a2m <- merge( a[a$clm=="Life", ], a[a$clm=="General", ] ,by=c("id", "loc"), all=T)
 > a2m[ is.na(a2m$clm.x) | is.na(a2m$clm.y), ]
  id loc clm.x   clm.y
1 A1  B1  <NA> General
3 A3  B1  <NA> General
Alternate methods might include paste-ing id to loc and removingduplicates.
i.e
categ <- table(a$id,a$clm)
categ
   General Life
 A1       1    0
 A2       1    1
 A3       2    1
 A4       1    1
 A5       1    1

The desired output is

  id   loc  clm
1  A1  B1 General
6  A3  B1 General

Because the data set I am working on is quite big (~ 800,000 x 20)
with majority of the fields values being long strings, loopingturned out to
be very inefficient in comapring individual rows..
Are there any alternative efficient methods in implementing thisproblem?
Steven


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Removing & generating data by category

Reply via email to