Re: Migration validation approach for major Solr version upgrades

Parveen Saini Mon, 16 Mar 2026 00:12:13 -0700

Hi David,

Yes, that's the change I was referring to.


In our case the queries had subqueries using local params that worked under
the previous setup, but after the upgrade they were interpreted differently
when edismax was the default parser, which showed up as candidate set
differences during query replay.

One tricky part was that this happened silently for us. I opened a JIRA
mainly to ask whether emitting a warning might help teams upgrading who may
miss this in the upgrade notes:
https://issues.apache.org/jira/browse/SOLR-18151

The harness helped surface these cases quickly when comparing results
across versions.

Regards,
Parveen

On Sun, Mar 15, 2026 at 6:42 AM David Smiley <[email protected]> wrote:

> I believe most of what Parveen refers to was for the 7.x -> 8.x upgrade,
> coming out of Lucene.
>
> RE "subquery parsing" -- can you be more specific please?  I suspect you
> might refer to this, reading from the 7.2.1 upgrade notes (that anyone
> upgrading hopefully read):
>
> - Starting a query string with local-params {!myparser ...} is used to
> switch the query parser to another, and is intended for use by Solr system
> developers, not end users doing searches. To reduce negative side-effects
> of unintended hack-ability, we've limited the cases that local-params will
> be parsed to only contexts in which the default parser is "lucene" or
> "func". So if defType=edismax then q={!myparser ...} won't work. In that
> example, put the desired query parser into defType. Another example is if
> deftype=edismax then hl.q={!myparser ...} won't work for the same reason.
> In that example, either put the desired query parser into hl.qparser or set
> hl.qparser=lucene. Most users won't run into these cases but some will and
> must change. If you must have full backwards compatibility, use
> luceneMatchVersion=7.1.0 or something earlier. [SOLR-11501](
> https://issues.apache.org/jira/browse/SOLR-11501) (David Smiley)
>
> On Thu, Mar 12, 2026 at 4:31 PM Parveen Saini <[email protected]
> >
> wrote:
>
> > Ah - by contributors I meant contributing factors, not people :)
> >
> > And yes, completely agree with your point about open source. Being able
> to
> > inspect the code and experiment with changes made it much easier for us
> to
> > understand what was happening and adapt the system.
> >
> > Regards,
> > Parveen
> >
> > On Thu, Mar 12, 2026 at 12:55 PM Gus Heck <[email protected]> wrote:
> >
> > > I'm not particularly interested in contributors... just the code
> changes
> > > themselves. We're all doing our best, but if any of these changes had
> > > unexpected effects we should all learn from it (and if applicable
> > > reconsider if the change was worth the impact)
> > >
> > > Also as a side note, your story at a high level is: Things changed in a
> > way
> > > that impacted us, but since it was open source we were able to make the
> > > necessary changes to recover the important functionality. Imagine if
> this
> > > had happened with a closed source, commercial product.
> > >
> > > On Thu, Mar 12, 2026 at 3:37 PM Parveen Saini <
> > [email protected]
> > > >
> > > wrote:
> > >
> > > > Hi Gus,
> > > >
> > > > Thanks for taking the time to read the article.
> > > >
> > > > In our case we were able to trace a few contributors. One was
> > similarity
> > > > changes in Lucene, which shifted score distributions, so we ended up
> > > adding
> > > > a custom similarity to match the previous behavior. We also saw
> effects
> > > > from negative score handling in boost functions, and some score
> > > differences
> > > > for primitive fields due to similarity changes.
> > > >
> > > > Separately, we ran into a silent subquery parsing behavior change,
> > which
> > > > affected the candidate set rather than just scoring.
> > > >
> > > > The harness, in my original mail, includes a small sample corpus and
> > > > queries that show these differences side-by-side (even when using the
> > > same
> > > > similarity). Not suggesting these are bugs — just that the combined
> > > effects
> > > > made the upgrade behavior harder to reason about without tooling like
> > > this.
> > > >
> > > > Regards,
> > > > Parveen
> > > >
> > > > On Thu, Mar 12, 2026 at 5:40 AM Gus Heck <[email protected]> wrote:
> > > >
> > > > > Interesting article, though it leaves me wondering which changes or
> > > sets
> > > > of
> > > > > changes caused the problematic variations in behavior.
> > > > >
> > > > > On Thu, Mar 12, 2026 at 3:18 AM Parveen Saini <
> > > > [email protected]
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Following up on the earlier thread about the migration validation
> > > > > approach
> > > > > > for major Solr upgrades, I wrote up the full story and lessons
> > > learned
> > > > > from
> > > > > > the Solr 5 to 8 migration we discussed.
> > > > > >
> > > > > > Sharing here in case it’s useful for others planning similar
> > > upgrades:
> > > > > > https://dzone.com/articles/solr5-to-solr8-migration-ads-system
> > > > > >
> > > > > > Best,
> > > > > > Parveen
> > > > > >
> > > > > > On Sat, Feb 28, 2026 at 9:17 AM Parveen Saini <
> > > > > [email protected]
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > While migrating a production system from Solr 5 to 8, we
> > > encountered
> > > > > > > subtle ranking drift that did not surface in standard upgrade
> > > > testing.
> > > > > > The
> > > > > > > system functioned correctly, but ranking behavior changed in
> non
> > > > > obvious
> > > > > > > ways. We observed score distribution shifts, candidate set
> > > > differences
> > > > > > > influenced by negative score handling, and p99 latency
> > regressions.
> > > > > > >
> > > > > > > Using that migration as a real world case, I put together a
> small
> > > > side
> > > > > by
> > > > > > > side validation harness designed to make behavioral differences
> > > > across
> > > > > > > major Solr versions observable. The goal is not to provide
> > version
> > > > > > specific
> > > > > > > guidance, but to offer a structured approach for detecting
> > ranking
> > > > and
> > > > > > > performance drift during major upgrades.
> > > > > > >
> > > > > > > The harness compares docset overlap, score distributions, and
> > query
> > > > > level
> > > > > > > behavior across versions.
> > > > > > >
> > > > > > > Sharing in case it is useful for others planning major Solr
> > > upgrades:
> > > > > > >
> > https://github.com/parveensaini/solr-lucene-migration-correctness
> > > > > > >
> > > > > > > Happy to share more details or present the approach at an
> > upcoming
> > > > > > > community meetup if there is interest.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Parveen
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > http://www.needhamsoftware.com (work)
> > > > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
> > > > >
> > > >
> > >
> > >
> > > --
> > > http://www.needhamsoftware.com (work)
> > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
> > >
> >
>

Re: Migration validation approach for major Solr version upgrades

Reply via email to