Nonsense. You are not telling us something or I have failed to understand something.
Consider: v1 = c("a","b") v2 = "c("a","a") It is not possible to change the value of a sum of values corresponding to v2="a" without also changing that for v1, which is not supposed to change according to my understanding of your specification. So I'm done. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Sun, Mar 22, 2015 at 8:28 AM, Luca Meyer <lucam1...@gmail.com> wrote: > Sorry forgot to keep the rest of the group in the loop - Luca > ---------- Forwarded message ---------- > From: Luca Meyer <lucam1...@gmail.com> > Date: 2015-03-22 16:27 GMT+01:00 > Subject: Re: [R] Joining two datasets - recursive procedure? > To: Bert Gunter <gunter.ber...@gene.com> > > > Hi Bert, > > That is exactly what I am trying to achieve. Please notice that negative v4 > values are allowed. I have done a similar task in the past manually by > recursively alterating v4 distribution across v3 categories within fix each > v1&v2 combination so I am quite positive it can be achieved but honestly I > took me forever to do it manually and since this is likely to be an > exercise I need to repeat from time to time I wish I could learn how to do > it programmatically.... > > Thanks again for any further suggestion you might have, > > Luca > > > 2015-03-22 16:05 GMT+01:00 Bert Gunter <gunter.ber...@gene.com>: > >> Oh, wait a minute ... >> >> You still want the marginals for the other columns to be as originally? >> >> If so, then this is impossible in general as the sum of all the values >> must be what they were originally and you cannot therefore choose your >> values for V3 arbitrarily. >> >> Or at least, that seems to be what you are trying to do. >> >> -- Bert >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> (650) 467-7374 >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> Clifford Stoll >> >> >> >> >> On Sun, Mar 22, 2015 at 7:55 AM, Bert Gunter <bgun...@gene.com> wrote: >> > I would have thought that this is straightforward given my previous >> email... >> > >> > Just set z to what you want -- e,g, all B values to 29/number of B's, >> > and all C values to 2.567/number of C's (etc. for more categories). >> > >> > A slick but sort of cheat way to do this programmatically -- in the >> > sense that it relies on the implementation of factor() rather than its >> > API -- is: >> > >> > y <- f1$v3 ## to simplify the notation; could be done using with() >> > z <- (c(29,2.567)/table(y))[c(y)] >> > >> > Then proceed to z1 as I previously described >> > >> > -- Bert >> > >> > >> > Bert Gunter >> > Genentech Nonclinical Biostatistics >> > (650) 467-7374 >> > >> > "Data is not information. Information is not knowledge. And knowledge >> > is certainly not wisdom." >> > Clifford Stoll >> > >> > >> > >> > >> > On Sun, Mar 22, 2015 at 2:00 AM, Luca Meyer <lucam1...@gmail.com> wrote: >> >> Hi Bert, hello R-experts, >> >> >> >> I am close to a solution but I still need one hint w.r.t. the following >> >> procedure (available also from >> >> https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0) >> >> >> >> rm(list=ls()) >> >> >> >> # this is (an extract of) the INPUT file I have: >> >> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", >> >> "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", >> >> "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", >> >> "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, >> 2.37232, >> >> 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), >> class >> >> = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, >> 167L, >> >> 197L, 204L, 206L)) >> >> >> >> # this is the procedure that Bert suggested (slightly adjusted): >> >> z <- rnorm(nrow(f1)) ## or anything you want >> >> z1 <- round(with(f1,v4 + z -ave(z,v1,v2,FUN=mean)), digits=5) >> >> aggregate(v4~v1*v2,f1,sum) >> >> aggregate(z1~v1*v2,f1,sum) >> >> aggregate(v4~v3,f1,sum) >> >> aggregate(z1~v3,f1,sum) >> >> >> >> My question to you is: how can I set z so that I can obtain specific >> values >> >> for z1-v4 in the v3 aggregation? >> >> In other words, how can I configure the procedure so that e.g. B=29 and >> >> C=2.56723 after running the procedure: >> >> aggregate(z1~v3,f1,sum) >> >> >> >> Thank you, >> >> >> >> Luca >> >> >> >> PS: to avoid any doubts you might have about who I am the following is >> my >> >> web page: http://lucameyer.wordpress.com/ >> >> >> >> >> >> 2015-03-21 18:13 GMT+01:00 Bert Gunter <gunter.ber...@gene.com>: >> >>> >> >>> ... or cleaner: >> >>> >> >>> z1 <- with(f1,v4 + z -ave(z,v1,v2,FUN=mean)) >> >>> >> >>> >> >>> Just for curiosity, was this homework? (in which case I should >> >>> probably have not provided you an answer -- that is, assuming that I >> >>> HAVE provided an answer). >> >>> >> >>> Cheers, >> >>> Bert >> >>> >> >>> Bert Gunter >> >>> Genentech Nonclinical Biostatistics >> >>> (650) 467-7374 >> >>> >> >>> "Data is not information. Information is not knowledge. And knowledge >> >>> is certainly not wisdom." >> >>> Clifford Stoll >> >>> >> >>> >> >>> >> >>> >> >>> On Sat, Mar 21, 2015 at 7:53 AM, Bert Gunter <bgun...@gene.com> wrote: >> >>> > z <- rnorm(nrow(f1)) ## or anything you want >> >>> > z1 <- f1$v4 + z - with(f1,ave(z,v1,v2,FUN=mean)) >> >>> > >> >>> > >> >>> > aggregate(v4~v1,f1,sum) >> >>> > aggregate(z1~v1,f1,sum) >> >>> > aggregate(v4~v2,f1,sum) >> >>> > aggregate(z1~v2,f1,sum) >> >>> > aggregate(v4~v3,f1,sum) >> >>> > aggregate(z1~v3,f1,sum) >> >>> > >> >>> > >> >>> > Cheers, >> >>> > Bert >> >>> > >> >>> > Bert Gunter >> >>> > Genentech Nonclinical Biostatistics >> >>> > (650) 467-7374 >> >>> > >> >>> > "Data is not information. Information is not knowledge. And knowledge >> >>> > is certainly not wisdom." >> >>> > Clifford Stoll >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > On Sat, Mar 21, 2015 at 6:49 AM, Luca Meyer <lucam1...@gmail.com> >> wrote: >> >>> >> Hi Bert, >> >>> >> >> >>> >> Thank you for your message. I am looking into ave() and tapply() as >> you >> >>> >> suggested but at the same time I have prepared a example of input >> and >> >>> >> output >> >>> >> files, just in case you or someone else would like to make an >> attempt >> >>> >> to >> >>> >> generate a code that goes from input to output. >> >>> >> >> >>> >> Please see below or download it from >> >>> >> https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 >> >>> >> >> >>> >> # this is (an extract of) the INPUT file I have: >> >>> >> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", >> >>> >> "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >> >>> >> "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", >> >>> >> "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, >> >>> >> 1.42917, >> >>> >> 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, >> >>> >> 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", >> >>> >> row.names = >> >>> >> c(2L, >> >>> >> 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) >> >>> >> >> >>> >> # this is (an extract of) the OUTPUT file I would like to obtain: >> >>> >> f2 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", >> >>> >> "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >> >>> >> "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", >> >>> >> "B", "B", "B", "C", "C", "C"), v4 = c(17.83529, 3.43806,0.00295, >> >>> >> 1.77918, >> >>> >> 1.05786, 0.0002, 2.37232, 3.01835, 0, 1.13430, 0.92872, >> >>> >> 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", >> >>> >> row.names = >> >>> >> c(2L, >> >>> >> 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) >> >>> >> >> >>> >> # please notice that while the aggregated v4 on v3 has changed … >> >>> >> aggregate(f1[,c("v4")],list(f1$v3),sum) >> >>> >> aggregate(f2[,c("v4")],list(f2$v3),sum) >> >>> >> >> >>> >> # … the aggregated v4 over v1xv2 has remained unchanged: >> >>> >> aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) >> >>> >> aggregate(f2[,c("v4")],list(f2$v1,f2$v2),sum) >> >>> >> >> >>> >> Thank you very much in advance for your assitance. >> >>> >> >> >>> >> Luca >> >>> >> >> >>> >> 2015-03-21 13:18 GMT+01:00 Bert Gunter <gunter.ber...@gene.com>: >> >>> >>> >> >>> >>> 1. Still not sure what you mean, but maybe look at ?ave and >> ?tapply, >> >>> >>> for which ave() is a wrapper. >> >>> >>> >> >>> >>> 2. You still need to heed the rest of Jeff's advice. >> >>> >>> >> >>> >>> Cheers, >> >>> >>> Bert >> >>> >>> >> >>> >>> Bert Gunter >> >>> >>> Genentech Nonclinical Biostatistics >> >>> >>> (650) 467-7374 >> >>> >>> >> >>> >>> "Data is not information. Information is not knowledge. And >> knowledge >> >>> >>> is certainly not wisdom." >> >>> >>> Clifford Stoll >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> On Sat, Mar 21, 2015 at 4:53 AM, Luca Meyer <lucam1...@gmail.com> >> >>> >>> wrote: >> >>> >>> > Hi Jeff & other R-experts, >> >>> >>> > >> >>> >>> > Thank you for your note. I have tried myself to solve the issue >> >>> >>> > without >> >>> >>> > success. >> >>> >>> > >> >>> >>> > Following your suggestion, I am providing a sample of the >> dataset I >> >>> >>> > am >> >>> >>> > using below (also downloadble in plain text from >> >>> >>> > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0): >> >>> >>> > >> >>> >>> > #this is an extract of the overall dataset (n=1200 cases) >> >>> >>> > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", >> "B", >> >>> >>> > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >> >>> >>> > "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", >> >>> >>> > "B", "B", "B", "C", "C", "C"), v4 = c(18.1853007621835, >> >>> >>> > 3.43806581506388, >> >>> >>> > 0.002733567617055, 1.42917483425029, 1.05786640463504, >> >>> >>> > 0.000420548864162308, >> >>> >>> > 2.37232740842861, 3.01835841813241, 0, 1.13430282139936, >> >>> >>> > 0.928725667117666, >> >>> >>> > 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", >> >>> >>> > row.names >> >>> >>> > = >> >>> >>> > c(2L, >> >>> >>> > 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) >> >>> >>> > >> >>> >>> > I need to find a automated procedure that allows me to adjust v3 >> >>> >>> > marginals >> >>> >>> > while maintaining v1xv2 marginals unchanged. >> >>> >>> > >> >>> >>> > That is: modify the v4 values you can find by running: >> >>> >>> > >> >>> >>> > aggregate(f1[,c("v4")],list(f1$v3),sum) >> >>> >>> > >> >>> >>> > while maintaining costant the values you can find by running: >> >>> >>> > >> >>> >>> > aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) >> >>> >>> > >> >>> >>> > Now does it make sense? >> >>> >>> > >> >>> >>> > Please notice I have tried to build some syntax that tries to >> modify >> >>> >>> > values >> >>> >>> > within each v1xv2 combination by computing sum of v4, row >> percentage >> >>> >>> > in >> >>> >>> > terms of v4, and there is where my effort is blocked. Not really >> >>> >>> > sure >> >>> >>> > how I >> >>> >>> > should proceed. Any suggestion? >> >>> >>> > >> >>> >>> > Thanks, >> >>> >>> > >> >>> >>> > Luca >> >>> >>> > >> >>> >>> > >> >>> >>> > 2015-03-19 2:38 GMT+01:00 Jeff Newmiller < >> jdnew...@dcn.davis.ca.us>: >> >>> >>> > >> >>> >>> >> I don't understand your description. The standard practice on >> this >> >>> >>> >> list >> >>> >>> >> is >> >>> >>> >> to provide a reproducible R example [1] of the kind of data you >> are >> >>> >>> >> working >> >>> >>> >> with (and any code you have tried) to go along with your >> >>> >>> >> description. >> >>> >>> >> In >> >>> >>> >> this case, that would be two dputs of your input data frames >> and a >> >>> >>> >> dput >> >>> >>> >> of >> >>> >>> >> an output data frame (generated by hand from your input data >> >>> >>> >> frame). >> >>> >>> >> (Probably best to not use the full number of input values just >> to >> >>> >>> >> keep >> >>> >>> >> the >> >>> >>> >> size down.) We could then make an attempt to generate code that >> >>> >>> >> goes >> >>> >>> >> from >> >>> >>> >> input to output. >> >>> >>> >> >> >>> >>> >> Of course, if you post that hard work using HTML then it will >> get >> >>> >>> >> corrupted (much like the text below from your earlier emails) >> and >> >>> >>> >> we >> >>> >>> >> won't >> >>> >>> >> be able to use it. Please learn to post from your email software >> >>> >>> >> using >> >>> >>> >> plain text when corresponding with this mailing list. >> >>> >>> >> >> >>> >>> >> [1] >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> --------------------------------------------------------------------------- >> >>> >>> >> Jeff Newmiller The ..... >> ..... Go >> >>> >>> >> Live... >> >>> >>> >> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. >> >>> >>> >> Live >> >>> >>> >> Go... >> >>> >>> >> Live: OO#.. Dead: OO#.. >> >>> >>> >> Playing >> >>> >>> >> Research Engineer (Solar/Batteries O.O#. #.O#. >> >>> >>> >> with >> >>> >>> >> /Software/Embedded Controllers) .OO#. .OO#. >> >>> >>> >> rocks...1k >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> --------------------------------------------------------------------------- >> >>> >>> >> Sent from my phone. Please excuse my brevity. >> >>> >>> >> >> >>> >>> >> On March 18, 2015 9:05:37 AM PDT, Luca Meyer < >> lucam1...@gmail.com> >> >>> >>> >> wrote: >> >>> >>> >> >Thanks for you input Michael, >> >>> >>> >> > >> >>> >>> >> >The continuous variable I have measures quantities (down to the >> >>> >>> >> > 3rd >> >>> >>> >> >decimal level) so unfortunately are not frequencies. >> >>> >>> >> > >> >>> >>> >> >Any more specific suggestions on how that could be tackled? >> >>> >>> >> > >> >>> >>> >> >Thanks & kind regards, >> >>> >>> >> > >> >>> >>> >> >Luca >> >>> >>> >> > >> >>> >>> >> > >> >>> >>> >> >=== >> >>> >>> >> > >> >>> >>> >> >Michael Friendly wrote: >> >>> >>> >> >I'm not sure I understand completely what you want to do, but >> >>> >>> >> >if the data were frequencies, it sounds like task for fitting a >> >>> >>> >> >loglinear model with the model formula >> >>> >>> >> > >> >>> >>> >> >~ V1*V2 + V3 >> >>> >>> >> > >> >>> >>> >> >On 3/18/2015 2:17 AM, Luca Meyer wrote: >> >>> >>> >> >>* Hello, >> >>> >>> >> >*>>* I am facing a quite challenging task (at least to me) and >> I >> >>> >>> >> > was >> >>> >>> >> >wondering >> >>> >>> >> >*>* if someone could advise how R could assist me to speed the >> >>> >>> >> > task >> >>> >>> >> > up. >> >>> >>> >> >*>>* I am dealing with a dataset with 3 discrete variables and >> one >> >>> >>> >> >continuous >> >>> >>> >> >*>* variable. The discrete variables are: >> >>> >>> >> >*>>* V1: 8 modalities >> >>> >>> >> >*>* V2: 13 modalities >> >>> >>> >> >*>* V3: 13 modalities >> >>> >>> >> >*>>* The continuous variable V4 is a decimal number always >> greater >> >>> >>> >> > than >> >>> >>> >> >zero in >> >>> >>> >> >*>* the marginals of each of the 3 variables but it is >> sometimes >> >>> >>> >> > equal >> >>> >>> >> >to zero >> >>> >>> >> >*>* (and sometimes negative) in the joint tables. >> >>> >>> >> >*>>* I have got 2 files: >> >>> >>> >> >*>>* => one with distribution of all possible combinations of >> >>> >>> >> > V1xV2 >> >>> >>> >> >(some of >> >>> >>> >> >*>* which are zero or neagtive) and >> >>> >>> >> >*>* => one with the marginal distribution of V3. >> >>> >>> >> >*>>* I am trying to build the long and narrow dataset V1xV2xV3 >> in >> >>> >>> >> > such >> >>> >>> >> >a way >> >>> >>> >> >*>* that each V1xV2 cell does not get modified and V3 fits as >> >>> >>> >> > closely >> >>> >>> >> >as >> >>> >>> >> >*>* possible to its marginal distribution. Does it make sense? >> >>> >>> >> >*>>* To be even more specific, my 2 input files look like the >> >>> >>> >> >following. >> >>> >>> >> >*>>* FILE 1 >> >>> >>> >> >*>* V1,V2,V4 >> >>> >>> >> >*>* A, A, 24.251 >> >>> >>> >> >*>* A, B, 1.065 >> >>> >>> >> >*>* (...) >> >>> >>> >> >*>* B, C, 0.294 >> >>> >>> >> >*>* B, D, 2.731 >> >>> >>> >> >*>* (...) >> >>> >>> >> >*>* H, L, 0.345 >> >>> >>> >> >*>* H, M, 0.000 >> >>> >>> >> >*>>* FILE 2 >> >>> >>> >> >*>* V3, V4 >> >>> >>> >> >*>* A, 1.575 >> >>> >>> >> >*>* B, 4.294 >> >>> >>> >> >*>* C, 10.044 >> >>> >>> >> >*>* (...) >> >>> >>> >> >*>* L, 5.123 >> >>> >>> >> >*>* M, 3.334 >> >>> >>> >> >*>>* What I need to achieve is a file such as the following >> >>> >>> >> >*>>* FILE 3 >> >>> >>> >> >*>* V1, V2, V3, V4 >> >>> >>> >> >*>* A, A, A, ??? >> >>> >>> >> >*>* A, A, B, ??? >> >>> >>> >> >*>* (...) >> >>> >>> >> >*>* D, D, E, ??? >> >>> >>> >> >*>* D, D, F, ??? >> >>> >>> >> >*>* (...) >> >>> >>> >> >*>* H, M, L, ??? >> >>> >>> >> >*>* H, M, M, ??? >> >>> >>> >> >*>>* Please notice that FILE 3 need to be such that if I >> aggregate >> >>> >>> >> > on >> >>> >>> >> >V1+V2 I >> >>> >>> >> >*>* recover exactly FILE 1 and that if I aggregate on V3 I can >> >>> >>> >> > recover >> >>> >>> >> >a file >> >>> >>> >> >*>* as close as possible to FILE 3 (ideally the same file). >> >>> >>> >> >*>>* Can anyone suggest how I could do that with R? >> >>> >>> >> >*>>* Thank you very much indeed for any assistance you are >> able to >> >>> >>> >> >provide. >> >>> >>> >> >*>>* Kind regards, >> >>> >>> >> >*>>* Luca* >> >>> >>> >> > >> >>> >>> >> > [[alternative HTML version deleted]] >> >>> >>> >> > >> >>> >>> >> >______________________________________________ >> >>> >>> >> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> see >> >>> >>> >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >>> >>> >> >PLEASE do read the posting guide >> >>> >>> >> >http://www.R-project.org/posting-guide.html >> >>> >>> >> >and provide commented, minimal, self-contained, reproducible >> code. >> >>> >>> >> >> >>> >>> >> >> >>> >>> > >> >>> >>> > [[alternative HTML version deleted]] >> >>> >>> > >> >>> >>> > ______________________________________________ >> >>> >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> see >> >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help >> >>> >>> > PLEASE do read the posting guide >> >>> >>> > http://www.R-project.org/posting-guide.html >> >>> >>> > and provide commented, minimal, self-contained, reproducible >> code. >> >>> >> >> >>> >> >> >> >> >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.