> On Jul 13, 2016, at 6:48 AM, stn021 <stn...@gmail.com> wrote: > > Hello, > > so here a numerical example in R-code. Code is appended below. > > The output should be > 1) the numerical values of the abilities of the persons > 2) the multiplyer > > > Please note that > > 1) I have used non-linear optimization to solve this problem and got > the expected result, though not with R but other software. > > 2) I have applied lm() to this problem, even before I posted the > question. I am well aware of the syntax of formulas. I my last posting > I wrote the formula "freehand" so I made the previously mentioned > errors. Sorry about that. > > > > Unfortunately the formulas with I() as well as multiplying variables > before running R does not work here. I() does not apply to factors (R > tells me) and multiplying in advance also works only for continuous > variables, not for factors, because there is no known numerical value > to multiply. > > The latter is actually what my question is about, along with the > question on how to get R to treat two columns as two instances of the > same factor. > > > Just to be sure I used R to check if the data really counts as a > factor according to R-terminology. It really is a factor, see code > below. > > > > This is the code for generating the example-data: > > # --------------------------------------------------------------- # > pnames = c( "alice" , "bob" , "charlie" , "don" , "eve" , "freddy" > , "grace" , "henry" ) > pcount = length( pnames ) > > # abilities = runif( pcount ) > abilities = (1:pcount) / 10 > > persons = data.frame( name = pnames , ability = abilities ) > persons > > # random subset of possible combinations and extra df > combinations = combn( nrow( persons ) , 2 ) ; > combinations = cbind( combinations,combinations,combinations,combinations ) > combinations = combinations[ , runif(ncol(combinations))<0.5 ] > ccount = ncol( combinations ) > > observed_data = data.frame( > idx1 = combinations[1,] > , idx2 = combinations[2,] > , p1 = ( persons$name[ combinations[1,] ] ) > , p2 = ( persons$name[ combinations[2,] ] ) > ) > > abilities_data = data.frame( > a1 = persons$ability[ combinations[1,] ] > , a2 = persons$ability[ combinations[2,] ] > ) > > # y = result of cooperation of each pair > multiplyer = runif(1) + 1 > offset = 1 > cat( "multiplyer = " , multiplyer , "\n" ) > cat( "offset = " , offset , "\n" ) > > y0 = multiplyer * ( offset - ( abilities_data$a1 - abilities_data$a2 ) ^ 2 ) > noise = .05 * rnorm( ccount ) > > # check variables are really factors : > str( observed_data$p1 ) > dput( observed_data$p1 ) > > observed_data = data.frame( y = round( y0+noise,3 ) , observed_data ) > observed_data > > # --------------------------------------------------------------- #
Is this what is intended? > observed_data$p1ab <- persons$ability[ match(observed_data$p1, persons$name) ] > observed_data$p2ab <- persons$ability[ match(observed_data$p2, persons$name) ] > head(observed_data) y idx1 idx2 p1 p2 p1ab p2ab 1 1.149 1 6 alice freddy 0.1 0.6 2 1.006 1 7 alice grace 0.1 0.7 3 1.529 2 3 bob charlie 0.2 0.3 4 1.404 2 5 bob eve 0.2 0.5 5 1.205 2 6 bob freddy 0.2 0.6 6 1.187 2 7 bob grace 0.2 0.7 > lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data) Call: lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data) Coefficients: (Intercept) I((p1ab - p2ab)^2) 1.506 -1.435 > separate_term <- lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data) > summary(separate_term) Call: lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data) Residuals: Min 1Q Median 3Q Max -0.116249 -0.030996 0.002633 0.032765 0.136282 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.50589 0.01067 141.08 <2e-16 *** I((p1ab - p2ab)^2) -1.43527 0.05863 -24.48 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.05304 on 44 degrees of freedom Multiple R-squared: 0.9316, Adjusted R-squared: 0.93 F-statistic: 599.2 on 1 and 44 DF, p-value: < 2.2e-16 You could also have compared 2 models differing only with rest to the includion of an interaction term that was the squared difference in abilities: > full <- lm( y ~ p1ab + p2ab + I( (p1ab -p2ab)^2 ), data=observed_data) > reduced <- lm( y ~ p1ab + p2ab , data=observed_data) > anova(full,reduced) Analysis of Variance Table Model 1: y ~ p1ab + p2ab + I((p1ab - p2ab)^2) Model 2: y ~ p1ab + p2ab Res.Df RSS Df Sum of Sq F Pr(>F) 1 42 0.11823 2 43 0.17315 -1 -0.05492 19.509 6.892e-05 *** -- David > > > 2016-07-11 19:16 GMT+02:00 Jeff Newmiller <jdnew...@dcn.davis.ca.us>: >> Your clarification is promising. A reproducible example is always >> preferred, though never a guarantee. I expect to be somewhat preoccupied >> this week so responses may be rather delayed, but the less setup we have to >> the more likely that someone on the list will tackle it. >> >> Re an answer: If you can make the example simple enough that you can tell us >> what the right numerical result will be, we will have a better chance of >> understanding what you are after. E.g. if you start with a solution and use >> it to create sample input data with then you don't need to actually solve it >> to illustrate what you are after. [1] >> >> Note that I am not aware of any package dedicated to this type of problem, >> so unless someone else responds otherwise then you will likely have to use >> bootstrapping or your own statistical analysis (Bayesian?) of the result. >> >> [1] >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >> -- >> Sent from my phone. Please excuse my brevity. >> >> On July 11, 2016 7:28:41 AM PDT, stn021 <stn...@gmail.com> wrote: >>> Hello, >>> >>> thank you for the replies. Sorry about the html-email, I forgot. >>> Should be OK with this email. >>> >>> >>> Don't be fooled be the apparent simplicity of the problem. I have >>> tried to reduce it to only a single relatively simple question. >>> >>> The idea here is to model cooperation of two persons. The model is >>> about one specific aspect of that cooperation, namely that two persons >>> with similar abilities may be able to produce better results that two >>> very different persons. >>> >>> That is only one part of the model with other parts modeling for >>> example the fact that of course two persons with a higher degree of >>> ability will produce better results per se. >>> >>> >>> It is not classic regression with factors. That can be easily done by >>> something like lm( y ~ (p1-p2)^2 ). >>> >>> This expands to lm( y ~ p1^2 - 2*p1*p2 + p2^2 ). This contains a >>> multiplicagtions and for lm() this implies interactions between the >>> factor-levels and produces one parameter for each combination of >>> factor-levels that occurs in the data. That is not what the question >>> is about. >>> >>> Also p1 and p2 are different levels of the same factor, while for lm() >>> it would be two different factors with different levels. >>> >>> >>> As for the sensical part: this has a real world application therefore >>> it makes sense. >>> >>> Also it is not so difficult to solve with non-linear optimization. I >>> was hoping to be able to use R for that purpose because then the >>> results could easily be checked with statistical tests. >>> >>> So my question is not "how to solve" but "how to solve with R". >>> >>> >>> As for the excess degrees of freedom, in real observations there would >>> of course be added noise due to either random variations or factors >>> not included in the model. So to generate a more reality-conforming >>> example I could add some random normal-distributed noise to the >>> dependent variable y. I previously left that part out because to me it >>> did not seem relevant. >>> >>> >>> Would you like me to make a complete example dataset with more records >>> and noise ? >>> >>> >>> The answer I look for would be the numerical values of the >>> factor-levels and numerical values for the multiplier (f) and the >>> offset (o), with p1 and p2 given as names (here: persons) and y given >>> as some level of achievement they reach by cooperating. >>> >>> y = f * ( o - ( p1 - p2 )^2 ) >>> >>> Is that what you meant by "answer" ? >>> >>> >>> THX >>> stefan >>> >>> >>> >>> >>> 2016-07-10 2:27 GMT+02:00 Jeff Newmiller <jdnew...@dcn.davis.ca.us>: >>>> >>>> I have seen less sensical questions. >>>> >>>> It would be nice if the example were a bit more complete (as in it >>> should have excess degrees of freedom and an answer) and less like a >>> homework problem (which are off topic here). It would of course also be >>> helpful if the OP were to conform to the Posting Guide, particularly in >>> respect to using plain text email. >>>> >>>> It looks like the kind of nonlinear optimization problem that >>> evolutionary algorithms are often applied to. It doesn't look (to me) >>> like a typical problem that factors get applied to in formulas though, >>> because multiple instances of the same factor variable are present. >>>> -- >>>> Sent from my phone. Please excuse my brevity. >>>> >>>> On July 9, 2016 4:59:30 PM PDT, Rolf Turner <r.tur...@auckland.ac.nz> >>> wrote: >>>>> On 09/07/16 20:52, stn021 wrote: >>>>>> Hello, >>>>>> >>>>>> I would like to analyse a model like this: >>>>>> >>>>>> y = 1 * ( 1 - ( x1 - x2 ) ^ 2 ) >>>>>> >>>>>> x1 and x2 are not continuous variables but factors, so the >>>>> observation >>>>>> contain the level. >>>>>> Its numerical value is unknown and is to be estimated with the >>> model. >>>>>> >>>>>> >>>>>> The observations look like this: >>>>>> >>>>>> y x1 x2 >>>>>> 0.96 Alice Bob >>>>>> 0.84 Alice Charlie >>>>>> 0.96 Bob Charlie >>>>>> 0.64 Dave Alice >>>>>> etc. >>>>>> >>>>>> Each person has a numerical value. Here for example Alice = 0.2 >>> and >>>>> Bob = >>>>>> 0.4 >>>>>> >>>>>> Then y = 0.96 = 1* ( 1- ( 0.2-0.4 ) ^ 2 ) , see first observation. >>>>>> >>>>>> How can this be done in R ? >>>>> >>>>> >>>>> This question makes about as little sense as it is possible to >>> imagine. >>>>> >>>>> cheers, >>>>> >>>>> Rolf Turner >>>> >> > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.