Stupid Joh wants to give you a big hug! Thanks! Why "rank" works but "order" not, I have still to figure out, though ...
Joh On Monday 04 October 2010 17:30:32 peter dalgaard wrote: > On Oct 4, 2010, at 16:57 , Johannes Graumann wrote: > > Hi, > > > > I'm turning my wheels on this and keep coming around to the same wrong > > solution - please have a look and give a hand ... > > > > The premise is: a DF like so > > > >> loremIpsum <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. > > > > Quisque leo ipsum, ultricies scelerisque volutpat non, volutpat et nulla. > > Curabitur consequat ullamcorper tellus id imperdiet. Duis semper > > malesuada nulla, blandit lobortis diam fringilla at. Vestibulum nec > > tellus orci, eu sollicitudin quam. Phasellus sit amet enim diam. > > Phasellus mattis hendrerit varius. Curabitur ut tristique enim. Lorem > > ipsum dolor sit amet, consectetur adipiscing elit. Sed convallis, tortor > > id vehicula facilisis, nunc justo facilisis tellus, sed eleifend nisi > > lacus id purus. Maecenas tempus sollicitudin libero, molestie laoreet > > metus dapibus eu. Mauris justo ante, mattis et pulvinar a, varius > > pretium eros. Curabitur fringilla dui ac dui rutrum pretium. Donec sed > > magna adipiscing nisi accumsan congue sed ac est. Vivamus lorem urna, > > tristique quis accumsan quis, ullamcorper aliquet velit." > > > >> tmpDF <- data.frame(Column1=rep(unlist(strsplit(loremIpsum," > > > > ")),length.out=510),Column2=runif(510,min=0,max=1e8)) > > > > is to be split into DFs with 50 entries in an ordered manner according to > > column2 (first DF ist o contain the rows with the 50 largest numbers, > > ...). > > > > Here is what I have been doing: > >> binSize <- 50 > >> splitMembership <- > > > > pmin(ceiling(order(tmpDF[["Column2"]],decreasing=TRUE)/binSize),floor(nro > > w(tmpDF)/binSize)) > > > >> splitList <- split(tmpDF,splitMembership) > > > > Distribution seems to work ... > > > >> sapply(splitList,nrow) > > > > But this is NOT what I wanted ... > > > >> sapply(splitList,function(x){max(x[["Column2"]])}) > > > > This was supposed to give me bins that are Column2-sorted and bin one > > should have a higher max than 2 than 3 ... > > > > Can anyone point out where (my now 3 reimplementations) fail? > > > > Thanks, Stupid Joh > > Dear Stupid Joh, > > Have you considered something along the lines of > > o <- order(-x$Column2) > xx <- x[o,] > split(xx, (seq_len(NROW(x))-1) %/% 50) > > The above is a bit hard to follow, but it seems to work better with rank() instead of order(): > > splitMembership <- > > + > pmin(ceiling(rank(-tmpDF[["Column2"]])/binSize),floor(nrow(tmpDF)/binSize) > ) > > > splitList <- split(tmpDF,splitMembership)> sapply(splitList,nrow) > > 1 2 3 4 5 6 7 8 9 10 > 50 50 50 50 50 50 50 50 50 60 > > > sapply(splitList,function(x){max(x[["Column2"]])}) > > 1 2 3 4 5 6 > 99877498 90567877 81965382 69112280 59814266 52130373 > 7 8 9 10 > 41557660 32630212 21226996 11880032
signature.asc
Description: This is a digitally signed message part.
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.