Christofer, That was a detailed follow-up — you clarified the requirements precisely enough providing a position to proceed from...
To implement this, a constrained optimization procedure to estimate an ordinal logistic regression model is needed (cumulative logit), based on: 1. Estimated Cutpoints – No intercept in the linear predictor – Cutpoints (thresholds) will be estimated directly from the data 2. Coefficient Constraints – Box constraints on each coefficient – For now: lower = c(1, -1, 0) upper = c(2, 1, 1) – These apply to: pared, public, gpa (in that order) 3. Sum Constraint – The sum of coefficients must equal 1.60 4. Data to use... – Use the IDRE ologit.dta dataset from UCLA (for now) Technical Approach • Attempt to write a custom negative log-likelihood function using the cumulative logit formulation: P(Y≤k∣X)=11+exp[−(θk−Xβ)]P(Y \leq k \mid X) = \frac{1}{1 + \exp[-(\theta_k - X\beta)]} and derive P(Y=k)P(Y = k) from adjacent differences of these. • Cutpoints θk\theta_k will be estimated as separate parameters, with constraints to ensure they’re strictly increasing for identifiability. • The optimization will use nloptr::nloptr(), which supports: - Lower/upper bounds (box constraints) - Equality constraints (for sum of β) - Nonlinear inequality constraints (to keep cutpoints ordered) I’ll have some time later - in the next day or two to attempt a script with: - Custom negative log-likelihood - Parameter vector split into β and cutpoints - Constraint functions: sum(β) = 1.60 and increasing cutpoints - Optimization via nloptr() Hopefully, you’ll be able to run it locally with only the VGAM, foreign, and nloptr packages. I’ll send the .R file along with the next email. A best attempt, anyway. r/ Gregg “Oh, what fun it is!” —Alice, Alice’s Adventures in Wonderland by Lewis Carroll On Thursday, April 24th, 2025 at 1:56 AM, Christofer Bogaso <bogaso.christo...@gmail.com> wrote: > > > Hi Gregg, > > Many thanks for your detail feedback, those are really super helpful. > > I have ordered a copy of Agresti from our local library, however it > appears that I would need to wait for a few days. > > Regrading my queries, it would be super helpful if we can build a > optimization algo based on below criterias: > > 1. Whether you want the cutpoints fixed (and to what values), or if > you want them estimated separately (with identifiability managed some > other way); I would like to have cut-points directly estimated from > the data > 2. What your bounds on the coefficients are (lower/upper vectors), For > this question, I am having different bounds for each of the > coefficients. Therefore I would have a vector of lower points and > other vector of upper points. However to start with let consider lower > and upper bounds as lower = c(1, -1, 0) and upper = c(2, 1, 1) for the > variables pared, public, and gpa. In my model, there would not be any > Intercept, hence no question on its bounds > 3. What value the sum of coefficients should equal (e.g., sum(β) = 1, > or something else); Let the sum be 1.60 > 4. And whether you're working with the IDRE example data, or something > else. My original data is actually residing in a computer without any > access to the internet (for data security.) However we can start with > any suitable data for this experiment, so one such data may be > https://stats.idre.ucla.edu/stat/data/ologit.dta. However I am still > exploring if there is any possibility to extract a randomized copy of > that actual data, if I can - I will share once available > > Again, many thanks for your time and insights. > > Thanks and regards, > > On Wed, Apr 23, 2025 at 9:54 PM Gregg Powell g.a.pow...@protonmail.com wrote: > > > Hello again Christofer, > > Thanks for your thoughtful note — I’m glad the outline was helpful. > > Apologies for the long delay getting back to you. Had to do a small bit of > > research… > > > > Recommended Text on Ordinal Log-Likelihoods: > > You're right that most online sources jump straight to code or canned > > functions. For a solid theoretical treatment of ordinal models and their > > likelihoods, consider: > > "Categorical Data Analysis" by Alan Agresti > > – Especially Chapters 7 and 8 on ordinal logistic regression. > > – Covers proportional odds models, cumulative logits, adjacent-category > > logits, and the derivation of likelihood functions. > > – Provides not only equations but also intuition behind the model structure. > > It’s a standard reference in the field and explains how to build > > log-likelihoods from first principles — including how the cumulative > > probabilities arise and why the difference of CDFs represents a > > category-specific probability. > > Also helpful: > > "An Introduction to Categorical Data Analysis" by Agresti (2nd ed) – A bit > > more accessible, and still covers the basics of ordinal models. > > ________________________________________ > > > > If You Want to Proceed Practically: > > In parallel with theory, if you'd like a working R example coded from > > scratch — with: > > • a custom likelihood for an ordinal (cumulative logit) model, > > • fixed thresholds / no intercept, > > • coefficient bounds, > > • and a sum constraint on β > > > > I’d be happy to attempt that using nloptr() or constrOptim(). You’d be able > > to walk through it and tweak it as necessary (no guarantee that I will get > > it right 😊) > > > > Just let me know: > > 1. Whether you want the cutpoints fixed (and to what values), or if you > > want them estimated separately (with identifiability managed some other > > way); > > 2. What your bounds on the coefficients are (lower/upper vectors), > > 3. What value the sum of coefficients should equal (e.g., sum(β) = 1, or > > something else); > > 4. And whether you're working with the IDRE example data, or something else. > > > > I can use the UCLA ologit.dta dataset as a basis if that's easiest to demo > > on, or if you have another dataset you’d prefer – again, let me know. > > > > All the best! > > > > gregg > > > > On Monday, April 21st, 2025 at 11:25 AM, Christofer Bogaso > > bogaso.christo...@gmail.com wrote: > > > > > Hi Gregg, > > > > > I am sincerely thankful for this workout. > > > > > Could you please suggest any text book on how to create log-likelihood > > > for an ordinal model like this? Most of my online search point me > > > directly to some R function etc, but a theoretical discussion on this > > > subject would be really helpful to construct the same. > > > > > Thanks and regards, > > > > > On Mon, Apr 21, 2025 at 9:55 PM Gregg Powell g.a.pow...@protonmail.com > > > wrote: > > > > > > Christofer, > > > > > > Given the constraints you mentioned—bounded parameters, no intercept, > > > > and a sum constraint—you're outside the capabilities of most > > > > off-the-shelf ordinal logistic regression functions in R like vglm or > > > > polr. > > > > > > The most flexible recommendation at this point is to implement custom > > > > likelihood optimization using constrOptim() or nloptr::nloptr() with a > > > > manually coded log-likelihood function for the cumulative logit model. > > > > > > Since you need: > > > > > > Coefficient bounds (e.g., lb ≤ β ≤ ub), > > > > > > No intercept, > > > > > > And a constraint like sum(β) = C, > > > > > > …you'll need to code your own objective function. Here's something of a > > > > high-level outline of the approach: > > > > > > A. Model Setup > > > > Let your design matrix X be n x p, and let Y take ordered values 1, 2, > > > > ..., K. > > > > > > B. Parameters > > > > Assume the thresholds (θ_k) are fixed (or removed entirely), and you’re > > > > estimating only the coefficient vector β. Your constraints are: > > > > > > Box constraints: lb ≤ β ≤ ub > > > > > > Equality constraint: sum(β) = C > > > > > > C. Likelihood > > > > The probability of category k is given by: > > > > > > P(Y = k) = logistic(θ_k - Xβ) - logistic(θ_{k-1} - Xβ) > > > > > > Without intercepts, this becomes more like: > > > > > > P(Y ≤ k) = 1 / (1 + exp(-(c_k - Xβ))) > > > > > > …where c_k are fixed thresholds. > > > > > > Implementation using nloptr > > > > This example shows a rough sketch using nloptr, which allows both > > > > equality and bound constraints: > > > > > > > library(nloptr) > > > > > > > # Custom negative log-likelihood function > > > > > negLL <- function(beta, X, y, K, cutpoints) { > > > > > eta <- X %*% beta > > > > > loglik <- 0 > > > > > for (k in 1:K) { > > > > > pk <- plogis(cutpoints[k] - eta) - plogis(cutpoints[k - 1] - eta) > > > > > loglik <- loglik + sum(log(pk[y == k])) > > > > > } > > > > > return(-loglik) > > > > > } > > > > > > > # Constraint: sum(beta) = C > > > > > sum_constraint <- function(beta, C) { > > > > > return(sum(beta) - C) > > > > > } > > > > > > > # Define objective and constraint wrapper > > > > > objective <- function(beta) negLL(beta, X, y, K, cutpoints) > > > > > eq_constraint <- function(beta) sum_constraint(beta, C = 2) # example > > > > > >C > > > > > > > # Run optimization > > > > > res <- nloptr( > > > > > x0 = rep(0, ncol(X)), > > > > > eval_f = objective, > > > > > lb = lower_bounds, > > > > > ub = upper_bounds, > > > > > eval_g_eq = eq_constraint, > > > > > opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-8) > > > > > ) > > > > > > The next step would be writing the actual log-likelihood for your > > > > specific problem or verifying constraint implementation. > > > > > > Manually coding the log-likelihood for an ordinal model is > > > > nontrivial... so a bit of a challenge :) > > > > > > All the best, > > > > gregg powell > > > > Sierra Vista, AZ
signature.asc
Description: OpenPGP digital signature
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.