[R] Estimating regression with constraints in model coefficients - Follow-up on Constrained Ordinal Model — Optimized via COBYLA

Gregg Powell via R-help Sun, 04 May 2025 13:27:58 -0700

Hello Christofer,

Just writing with a detailed follow-up. Attached is a script I was able to get 
running with a bit of work. I did not include the script in the ext of this 
email. It is only attached.


Optimization Progress
We were initially aiming to solve the dual-slope constrained ordinal model 
using nloptr's SLSQP algorithm (NLOPT_LD_SLSQP), since it supports:
• Box constraints (per-β lower/upper bounds)
• Equality constraints (sum(β) = 1.60 for each β set)

However, SLSQP requires explicit gradient definitions for both the objective 
function and constraint Jacobian. While the documentation suggests that 
eval_grad_f = NULL should trigger finite-difference approximation, this did not 
work in my initial script — I received repeated errors:
“A gradient for the objective function is needed by algorithm NLOPT_LD_SLSQP 
but was not supplied.”
Even when using wrapper functions with NA values for gradient placeholders, the 
optimizer failed to recognize them. 


After multiple diagnostic attempts and some research, I shifted to:
NLOPT_LN_COBYLA — Derivative-Free Optimization
This algorithm supports:
• Equality constraints (sum of β¹ and β² = 1.60)
• Bound constraints (per-element β limits)

Also - this does not require gradients. This made it the best practical option.

Results
The optimizer successfully converged (status: NLOPT_XTOL_REACHED) after 202 
iterations.

Estimated coefficients:
• β¹ (for P(Y ≤ 1)): c(1.0, -0.4, 1.0)
• β² (for P(Y ≤ 2)): c(1.0, 0.6, 3.25e-19)

Interpretation:
• Both β vectors satisfy their respective sum constraint (≈ 1.60).
• All elements are within their specified bounds.
• The third coefficient for β² (gpa) hits its lower bound (0), suggesting 
minimal influence on the second cumulative probability cutoff.
• The model does not estimate cutpoints/intercepts explicitly. Instead, the two 
separate slope vectors approximate this ordinal structure.
This confirms your structure — estimating distinct slopes per class boundary 
without intercepts — is now functioning with full constraints enforced.

________________________________________
Also, we could take this a step further - Let me know if you'd be interested in:
• A function to plot fitted probabilities per class (based on the estimated β¹ 
and β²),
• A summary write-up of this implementation stage (as a formal hand-off or 
documentation),
• A switch to alabama::augLag(), which allows both equality and bound 
constraints and uses augmented Lagrangian methods — possibly offering improved 
precision with gradient-based control.

Happy to assist with any of those, as time permits - perhaps next week.

r/
Gregg











On Thursday, May 1st, 2025 at 1:01 PM, Gregg Powell via R-help 
<[email protected]> wrote:

> 

> 

> Hello Christofer,
> 

> That’s good information. Thanks for the precision.
> Think I can use this now to work on a script:
> 

> Based on what you provided:
> 1. Two distinct sets of coefficients, β¹ and β², each associated with the 
> logits for:
> • P(Y≤1)P(Y ≤ 1)P(Y≤1)
> • P(Y≤2)P(Y ≤ 2)P(Y≤2)
> 

> 2. Separate sum constraints:
> • ∑β(1)=1.60 \ sum β^{(1)} = 1.60∑β(1)=1.60
> • ∑β(2)=1.60 \ sum β^{(2)} = 1.60∑β(2)=1.60
> 

> 3. Element-wise bounds applied to each β set:
> • lower = c(1, -1, 0)
> • upper = c(2, 1, 1)
> 

> Planning to wrap this into an updated nloptr optimization routine. I’ll work 
> on a .R script as time permits — if all goes well, you should be able to run 
> it as provided.
> 

> r/
> Gregg
> 

> 

> 

> 

> 

> On Wednesday, April 30th, 2025 at 3:23 AM, Christofer Bogaso 
> [email protected] wrote:
> 

> > Hi Gregg,
> 

> > Below I try to address
> 

> > 1) The sum constraint would apply for each set β¹ and β² i.e. sum(β¹)
> > = sum(β²) = 1.60
> > 2) Just like 1) the lower and upper bounds will be applied for
> > individual set i.e. individual elements of β¹ are subject to lower =
> > c(1, -1, 0) and upper = c(2, 1, 1) and individual elements of β² are
> > subject to lower = c(1, -1, 0) and upper = c(2, 1, 1)
> 

> > I hope that I am able to answer your questions. Please let me know if
> > further information is required.
> 

> > Thanks and regards,
> 

> > On Wed, Apr 30, 2025 at 4:22 AM Gregg Powell [email protected] 
> > wrote:
> 

> > > Hello again Christofer,
> > > This clarification changes the model structure somewhat significantly -it 
> > > shifts us from a standard cumulative logit model with proportional odds 
> > > to a non-parallel cumulative logit model, where each threshold has its 
> > > own set of β coefficients. At least, that is now my understanding.
> > > So, instead of a single β vector shared across all class boundaries, 
> > > you're now specifying:
> > > • One set of coefficients β(1)β^{(1)}β(1) for the logit of P(Y≤1)P(Y ≤ 
> > > 1)P(Y≤1),
> > > • A second, distinct set β(2)β^{(2)}β(2) for P(Y≤2)P(Y ≤ 2)P(Y≤2),
> > > • And no intercepts, meaning the threshold-specific slope vectors must 
> > > carry all the signal.
> 

> > > So, we can adjust the log-likelihood accordingly:
> > > P(Y=1)=logistic(Xβ(1))P(Y=2)=logistic(Xβ(2))−logistic(Xβ(1))P(Y=3)=1−logistic(Xβ(2))P(Y
> > >  = 1) = logistic(Xβ^{(1)}) P(Y = 2) = logistic(Xβ^{(2)}) - 
> > > logistic(Xβ^{(1)}) P(Y = 3) = 1 - logistic(Xβ^{(2)}) 
> > > P(Y=1)=logistic(Xβ(1))P(Y=2)=logistic(Xβ(2))−logistic(Xβ(1))P(Y=3)=1−logistic(Xβ(2))
> 

> > > Before I attempt a revised script, can you confirm:
> > > 1. Should the sum constraint (e.g., sum(β) = 1.60) apply to:
> > > • Only β¹?
> > > • Only β²?
> > > • Or the sum of all 6 coefficients (β¹ and β² combined)?
> 

> > > 2. Do you want to apply separate lower/upper bounds to each of the six β 
> > > coefficients (and if so, what are they for each)?
> 

> > > Once I understand this last part better, I’ll see about working on a 
> > > version that fits this updated structure and constraint logic.
> > > As always – no promises.
> > > r/
> > > Gregg Powell
> > > Sierra Vista, AZ
> 

> > > On Tuesday, April 29th, 2025 at 1:51 PM, Christofer Bogaso 
> > > [email protected] wrote:
> 

> > > > Hi Gregg,
> 

> > > > I am just wondering if you get any time to think about this problem.
> 

> > > > As I understand, typically for this type of problem, we have different
> > > > intercepts for different classes, while slope (beta) coefficients
> > > > remain the same across classes.
> 

> > > > But in my case, since we do not have any intercept term, the slope
> > > > coefficients need to be estimated separately for different classes.
> > > > Therefore, since we have 3 classes in the response variable (i.e.
> > > > 'apply'), there will be 3 different sets of beta-coefficients for the
> > > > independent variables.
> 

> > > > Under this situation, I wonder how I can create the likelihood
> > > > function subject to all applicable constraints.
> 

> > > > Any insight would be highly appreciated.
> 

> > > > Thanks and regards,
> 

> > > > On Fri, Apr 25, 2025 at 12:31 AM Gregg Powell [email protected] 
> > > > wrote:
> 

> > > > > Christofer,
> > > > > That was a detailed follow-up — you clarified the requirements 
> > > > > precisely enough providing a position to proceed from...
> 

> > > > > To implement this, a constrained optimization procedure to estimate 
> > > > > an ordinal logistic regression model is needed (cumulative logit), 
> > > > > based on:
> 

> > > > > 1. Estimated Cutpoints
> > > > > – No intercept in the linear predictor
> > > > > – Cutpoints (thresholds) will be estimated directly from the data
> 

> > > > > 2. Coefficient Constraints
> > > > > – Box constraints on each coefficient
> > > > > – For now:
> > > > > lower = c(1, -1, 0)
> > > > > upper = c(2, 1, 1)
> > > > > – These apply to: pared, public, gpa (in that order)
> 

> > > > > 3. Sum Constraint
> > > > > – The sum of coefficients must equal 1.60
> 

> > > > > 4. Data to use...
> > > > > – Use the IDRE ologit.dta dataset from UCLA (for now)
> 

> > > > > Technical Approach
> 

> > > > > • Attempt to write a custom negative log-likelihood function using 
> > > > > the cumulative logit formulation:
> 

> > > > > P(Y≤k∣X)=11+exp⁡[−(θk−Xβ)]P(Y \leq k \mid X) = \frac{1}{1 + 
> > > > > \exp[-(\theta_k - X\beta)]}
> 

> > > > > and derive P(Y=k)P(Y = k) from adjacent differences of these.
> 

> > > > > • Cutpoints θk\theta_k will be estimated as separate parameters, with 
> > > > > constraints to ensure they’re strictly increasing for identifiability.
> 

> > > > > • The optimization will use nloptr::nloptr(), which supports:
> > > > > - Lower/upper bounds (box constraints)
> > > > > - Equality constraints (for sum of β)
> > > > > - Nonlinear inequality constraints (to keep cutpoints ordered)
> 

> > > > > I’ll have some time later - in the next day or two to attempt a 
> > > > > script with:
> > > > > - Custom negative log-likelihood
> > > > > - Parameter vector split into β and cutpoints
> > > > > - Constraint functions: sum(β) = 1.60 and increasing cutpoints
> > > > > - Optimization via nloptr()
> 

> > > > > Hopefully, you’ll be able to run it locally with only the VGAM, 
> > > > > foreign, and nloptr packages.
> 

> > > > > I’ll send the .R file along with the next email. A best attempt, 
> > > > > anyway.
> 

> > > > > r/
> > > > > Gregg
> 

> > > > > “Oh, what fun it is!”
> > > > > —Alice, Alice’s Adventures in Wonderland by Lewis Carroll
> 

> > > > > On Thursday, April 24th, 2025 at 1:56 AM, Christofer Bogaso 
> > > > > [email protected] wrote:
> 

> > > > > > Hi Gregg,
> 

> > > > > > Many thanks for your detail feedback, those are really super 
> > > > > > helpful.
> 

> > > > > > I have ordered a copy of Agresti from our local library, however it
> > > > > > appears that I would need to wait for a few days.
> 

> > > > > > Regrading my queries, it would be super helpful if we can build a
> > > > > > optimization algo based on below criterias:
> 

> > > > > > 1. Whether you want the cutpoints fixed (and to what values), or if
> > > > > > you want them estimated separately (with identifiability managed 
> > > > > > some
> > > > > > other way); I would like to have cut-points directly estimated from
> > > > > > the data
> > > > > > 2. What your bounds on the coefficients are (lower/upper vectors), 
> > > > > > For
> > > > > > this question, I am having different bounds for each of the
> > > > > > coefficients. Therefore I would have a vector of lower points and
> > > > > > other vector of upper points. However to start with let consider 
> > > > > > lower
> > > > > > and upper bounds as lower = c(1, -1, 0) and upper = c(2, 1, 1) for 
> > > > > > the
> > > > > > variables pared, public, and gpa. In my model, there would not be 
> > > > > > any
> > > > > > Intercept, hence no question on its bounds
> > > > > > 3. What value the sum of coefficients should equal (e.g., sum(β) = 
> > > > > > 1,
> > > > > > or something else); Let the sum be 1.60
> > > > > > 4. And whether you're working with the IDRE example data, or 
> > > > > > something
> > > > > > else. My original data is actually residing in a computer without 
> > > > > > any
> > > > > > access to the internet (for data security.) However we can start 
> > > > > > with
> > > > > > any suitable data for this experiment, so one such data may be
> > > > > > https://stats.idre.ucla.edu/stat/data/ologit.dta. However I am still
> > > > > > exploring if there is any possibility to extract a randomized copy 
> > > > > > of
> > > > > > that actual data, if I can - I will share once available
> 

> > > > > > Again, many thanks for your time and insights.
> 

> > > > > > Thanks and regards,
> 

> > > > > > On Wed, Apr 23, 2025 at 9:54 PM Gregg Powell 
> > > > > > [email protected] wrote:
> 

> > > > > > > Hello again Christofer,
> > > > > > > Thanks for your thoughtful note — I’m glad the outline was 
> > > > > > > helpful. Apologies for the long delay getting back to you. Had to 
> > > > > > > do a small bit of research…
> 

> > > > > > > Recommended Text on Ordinal Log-Likelihoods:
> > > > > > > You're right that most online sources jump straight to code or 
> > > > > > > canned functions. For a solid theoretical treatment of ordinal 
> > > > > > > models and their likelihoods, consider:
> > > > > > > "Categorical Data Analysis" by Alan Agresti
> > > > > > > – Especially Chapters 7 and 8 on ordinal logistic regression.
> > > > > > > – Covers proportional odds models, cumulative logits, 
> > > > > > > adjacent-category logits, and the derivation of likelihood 
> > > > > > > functions.
> > > > > > > – Provides not only equations but also intuition behind the model 
> > > > > > > structure.
> > > > > > > It’s a standard reference in the field and explains how to build 
> > > > > > > log-likelihoods from first principles — including how the 
> > > > > > > cumulative probabilities arise and why the difference of CDFs 
> > > > > > > represents a category-specific probability.
> > > > > > > Also helpful:
> > > > > > > "An Introduction to Categorical Data Analysis" by Agresti (2nd 
> > > > > > > ed) – A bit more accessible, and still covers the basics of 
> > > > > > > ordinal models.
> > > > > > > ________________________________________
> 

> > > > > > > If You Want to Proceed Practically:
> > > > > > > In parallel with theory, if you'd like a working R example coded 
> > > > > > > from scratch — with:
> > > > > > > • a custom likelihood for an ordinal (cumulative logit) model,
> > > > > > > • fixed thresholds / no intercept,
> > > > > > > • coefficient bounds,
> > > > > > > • and a sum constraint on β
> 

> > > > > > > I’d be happy to attempt that using nloptr() or constrOptim(). 
> > > > > > > You’d be able to walk through it and tweak it as necessary (no 
> > > > > > > guarantee that I will get it right 😊)
> 

> > > > > > > Just let me know:
> > > > > > > 1. Whether you want the cutpoints fixed (and to what values), or 
> > > > > > > if you want them estimated separately (with identifiability 
> > > > > > > managed some other way);
> > > > > > > 2. What your bounds on the coefficients are (lower/upper vectors),
> > > > > > > 3. What value the sum of coefficients should equal (e.g., sum(β) 
> > > > > > > = 1, or something else);
> > > > > > > 4. And whether you're working with the IDRE example data, or 
> > > > > > > something else.
> 

> > > > > > > I can use the UCLA ologit.dta dataset as a basis if that's 
> > > > > > > easiest to demo on, or if you have another dataset you’d prefer – 
> > > > > > > again, let me know.
> 

> > > > > > > All the best!
> 

> > > > > > > gregg
> 

> > > > > > > On Monday, April 21st, 2025 at 11:25 AM, Christofer Bogaso 
> > > > > > > [email protected] wrote:
> 

> > > > > > > > Hi Gregg,
> 

> > > > > > > > I am sincerely thankful for this workout.
> 

> > > > > > > > Could you please suggest any text book on how to create 
> > > > > > > > log-likelihood
> > > > > > > > for an ordinal model like this? Most of my online search point 
> > > > > > > > me
> > > > > > > > directly to some R function etc, but a theoretical discussion 
> > > > > > > > on this
> > > > > > > > subject would be really helpful to construct the same.
> 

> > > > > > > > Thanks and regards,
> 

> > > > > > > > On Mon, Apr 21, 2025 at 9:55 PM Gregg Powell 
> > > > > > > > [email protected] wrote:
> 

> > > > > > > > > Christofer,
> 

> > > > > > > > > Given the constraints you mentioned—bounded parameters, no 
> > > > > > > > > intercept, and a sum constraint—you're outside the 
> > > > > > > > > capabilities of most off-the-shelf ordinal logistic 
> > > > > > > > > regression functions in R like vglm or polr.
> 

> > > > > > > > > The most flexible recommendation at this point is to 
> > > > > > > > > implement custom likelihood optimization using constrOptim() 
> > > > > > > > > or nloptr::nloptr() with a manually coded log-likelihood 
> > > > > > > > > function for the cumulative logit model.
> 

> > > > > > > > > Since you need:
> 

> > > > > > > > > Coefficient bounds (e.g., lb ≤ β ≤ ub),
> 

> > > > > > > > > No intercept,
> 

> > > > > > > > > And a constraint like sum(β) = C,
> 

> > > > > > > > > …you'll need to code your own objective function. Here's 
> > > > > > > > > something of a high-level outline of the approach:
> 

> > > > > > > > > A. Model Setup
> > > > > > > > > Let your design matrix X be n x p, and let Y take ordered 
> > > > > > > > > values 1, 2, ..., K.
> 

> > > > > > > > > B. Parameters
> > > > > > > > > Assume the thresholds (θ_k) are fixed (or removed entirely), 
> > > > > > > > > and you’re estimating only the coefficient vector β. Your 
> > > > > > > > > constraints are:
> 

> > > > > > > > > Box constraints: lb ≤ β ≤ ub
> 

> > > > > > > > > Equality constraint: sum(β) = C
> 

> > > > > > > > > C. Likelihood
> > > > > > > > > The probability of category k is given by:
> 

> > > > > > > > > P(Y = k) = logistic(θ_k - Xβ) - logistic(θ_{k-1} - Xβ)
> 

> > > > > > > > > Without intercepts, this becomes more like:
> 

> > > > > > > > > P(Y ≤ k) = 1 / (1 + exp(-(c_k - Xβ)))
> 

> > > > > > > > > …where c_k are fixed thresholds.
> 

> > > > > > > > > Implementation using nloptr
> > > > > > > > > This example shows a rough sketch using nloptr, which allows 
> > > > > > > > > both equality and bound constraints:
> 

> > > > > > > > > > library(nloptr)
> 

> > > > > > > > > > # Custom negative log-likelihood function
> > > > > > > > > > negLL <- function(beta, X, y, K, cutpoints) {
> > > > > > > > > > eta <- X %*% beta
> > > > > > > > > > loglik <- 0
> > > > > > > > > > for (k in 1:K) {
> > > > > > > > > > pk <- plogis(cutpoints[k] - eta) - plogis(cutpoints[k - 1] 
> > > > > > > > > > - eta)
> > > > > > > > > > loglik <- loglik + sum(log(pk[y == k]))
> > > > > > > > > > }
> > > > > > > > > > return(-loglik)
> > > > > > > > > > }
> 

> > > > > > > > > > # Constraint: sum(beta) = C
> > > > > > > > > > sum_constraint <- function(beta, C) {
> > > > > > > > > > return(sum(beta) - C)
> > > > > > > > > > }
> 

> > > > > > > > > > # Define objective and constraint wrapper
> > > > > > > > > > objective <- function(beta) negLL(beta, X, y, K, cutpoints)
> > > > > > > > > > eq_constraint <- function(beta) sum_constraint(beta, C = 2) 
> > > > > > > > > > # example >C
> 

> > > > > > > > > > # Run optimization
> > > > > > > > > > res <- nloptr(
> > > > > > > > > > x0 = rep(0, ncol(X)),
> > > > > > > > > > eval_f = objective,
> > > > > > > > > > lb = lower_bounds,
> > > > > > > > > > ub = upper_bounds,
> > > > > > > > > > eval_g_eq = eq_constraint,
> > > > > > > > > > opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-8)
> > > > > > > > > > )
> 

> > > > > > > > > The next step would be writing the actual log-likelihood for 
> > > > > > > > > your specific problem or verifying constraint implementation.
> 

> > > > > > > > > Manually coding the log-likelihood for an ordinal model is 
> > > > > > > > > nontrivial... so a bit of a challenge :)
> 

> > > > > > > > > All the best,
> > > > > > > > > gregg powell
> > > > > > > > > Sierra Vista, AZ______________________________________________
> 

> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

signature.asc
Description: OpenPGP digital signature

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Estimating regression with constraints in model coefficients - Follow-up on Constrained Ordinal Model — Optimized via COBYLA

Reply via email to