Re: [R] How can I rearange my dataframe

David Winsemius Tue, 09 Feb 2010 09:10:07 -0800


On Feb 9, 2010, at 11:24 AM, Alex Levitchi wrote:

Hello
I am recently began to work with R, so I am not so experienced.
But anyway I cannot find a clear way to process my dataframe whichis a bigger one.
It shows similar to this
name=c("A","B","C","B","C","C","C","B","C")
nicknames=c("A1","B1","C1","B2","C2","C3","C4","B3","C5")
value=c(4,5,9,2,7,6,3,6,7)
table=data.frame(cbind(name,nickname,value))
table=data.frame(cbind(name,nicknames,value))
table
name nicknames value
1 A A1 4
2 B B1 5
3 C C1 9
4 B B2 2
5 C C2 7
6 C C3 6
7 C C4 3
8 B B3 6
9 C C5 7

So I have to rearrange it in the next way:
- the first column should contain just unduplicated data, I didthis, it is OK and it will look like
1 A
2 B
3 C
- the second column should contain different 'nicknames' whichcorrespond to the single A, B or C
name nickname value
1 A A1
2 B B1,B2,B3
3 C C1,C2,C3,C4,C5

Dataframes are not designed to hold irregular length items. Lists arethe data structure best suited for this type of data. tapply() is onefunction useful for colecting elements of one structure based on thecontents of another ("name"):

(I renamed your table object "table1" to avoid confusion with thetable function.)


> tapply(table1$nicknames, table1$name, list)
$A
[1] A1
Levels: A1 B1 B2 B3 C1 C2 C3 C4 C5

$B
[1] B1 B2 B3
Levels: A1 B1 B2 B3 C1 C2 C3 C4 C5

$C
[1] C1 C2 C3 C4 C5
Levels: A1 B1 B2 B3 C1 C2 C3 C4 C5

The process of tabulating has created factor variables which somewould see as a good thing, but perhaps was not desired. Since you nowhave a lis, you can sequentially apply the as.character function torecover only the character vectors:


>lapply( tapply(table1$nicknames, table1$name, list), as.character)
$A
[1] "A1"

$B
[1] "B1" "B2" "B3"

$C
[1] "C1" "C2" "C3" "C4" "C5"

Then I saw the rest of your request, so forget the above and see ifthis two-liner looks a bit more simple.

> tcollapse <- tapply(table1$nicknames, table1$name, paste,collapse=", ")

#gets you the strings separated by commas and spaces.

> cbind(names(tcollapse), tcollapse, lapply( tapply(table1$nicknames,table1$name, list), length) )

      tcollapse
A "A" "A1"                 1
B "B" "B1, B2, B3"         3
C "C" "C1, C2, C3, C4, C5" 5

You can obviously name them whatever you like.

--
David

-the third one should contain the mean value of the numbers whichcorrespond to the same A, B or C
1 A A1 mean(4)
2 B B1,B2,B3 mean(5,2,6)
3 C C1,C2,C3,C4,C5 mean(9,7,6,3,7)

I did this using a loop 'for'.
to be clear I created tree dataframes which correspond to each ofcolumns, and finally will combine them
ulist=which(!duplicated(table$name)) # I extract the list ofpositions in which I don't have duplicationsname1=data.frame(table$name[ulist]) # I extract the list of uniquenamesnicknames1=data.frame(row.names(1:length(ulist))) # I create adataframe of dimension equal to unique list lengthvalue1=data.frame(row.names(1:length(ulist))) # I create adataframe of dimension equal to unique list length
for(i in 1:length(ulist)) {
position=which(as.character(name1[i,1])==table$name)
nicknames1[i,1]=toString(table$nicknames[position])
value1[i,1]=mean(as.numeric(table$value[position]))
}
fin=cbind(name1,nicknames1,value1)
colnames(fin)=c("NAME","NICKNAME","VALUE")
fin
NAME NICKNAME VALUE
1 A A1 3.000000
2 B B1, B2, B3 3.333333
3 C C1, C2, C3, C4, C5 5.200000
it works successfully. But in general I work with dataframes of highdimensions (tens thousands or more rows).So my loop works too slow (i.e., a dataframe of 20000 rows and 3columns is processed in about 10 minutes).I intend to integrate it into a function, so it is obvious that timewill be even longer.
If someone can advise me any possibility to modify which I have doneor to the way I can do it, please give me a message.
King regards to all guys who develop and maintain R sources for suchdummies as me
Alex Levitchi



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I rearange my dataframe

Reply via email to